pith. machine review for the scientific record. sign in

arxiv: 2604.19048 · v1 · submitted 2026-04-21 · 💻 cs.CL · cs.AI

Recognition: unknown

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:23 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords mixture of expertsLoRAparameter-efficient fine-tuningmulti-task learninglarge language modelssemantic routingtask-adaptive scaling
0
0 comments X

The pith

SAMoRA routes inputs to specialized LoRA experts via semantic matching and scales their contributions by task complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SAMoRA as a parameter-efficient fine-tuning method that combines mixture-of-experts with low-rank adaptation for large language models. It tackles imprecise routing by introducing a semantic-aware router that aligns input text meanings directly with expert modules, and adds task-adaptive scaling to vary each expert's influence according to the demands of different tasks. A regularization term further encourages experts to specialize while keeping scaling effective. Experiments across multi-task benchmarks show gains over prior methods plus stronger generalization to unseen tasks.

Core claim

SAMoRA introduces a Semantic-Aware Router that explicitly matches textual semantics to the most suitable LoRA experts for precise routing, a Task-Adaptive Scaling mechanism that dynamically adjusts expert contributions according to task-specific requirements, and a regularization objective that jointly promotes expert specialization and effective scaling, enabling better task-adaptive learning in large language models.

What carries the argument

The Semantic-Aware Router, which aligns input semantics to experts for routing decisions, together with the Task-Adaptive Scaling mechanism that regulates each expert's weight based on task needs.

If this is right

  • Expert modules become more specialized because routing now follows explicit semantic cues rather than uniform strategies.
  • Task performance improves when scaling factors adapt to varying task complexity instead of using fixed fusion weights.
  • Parameter efficiency is maintained while generalization to new tasks increases due to the joint regularization on specialization and scaling.
  • Routing decisions become interpretable through the semantic alignment step, reducing the black-box nature of prior MoE-LoRA approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same semantic router could be tested on instruction-tuning datasets where task boundaries are implicit rather than labeled.
  • If the router generalizes across model sizes, it might reduce the need to train separate expert pools for each downstream domain.
  • Combining the scaling mechanism with other adaptation techniques such as prompt tuning could be explored to further lower memory use.

Load-bearing premise

The semantic-aware router can reliably map textual inputs to the correct experts without routing mistakes or large added computation costs.

What would settle it

Running SAMoRA on the same multi-task benchmarks but observing no accuracy gain over standard MoE-LoRA baselines or measurable increases in routing errors would falsify the central claims.

Figures

Figures reproduced from arXiv: 2604.19048 by Boyan Shi, Huaiyu Wan, Junfeng Shen, Shaojiang Wang, Shengnan Guo, Shuyuan Zhao, Wei Chen.

Figure 1
Figure 1. Figure 1: Illustration of limitations in existing mech￾anisms. (a) MLP-based Routing: Fails to explicitly match tasks with expert capabilities, resulting in expert homogenization. (b) Uniform Weight Fusion: Applies a uniform update strength across diverse tasks, ignoring specific requirements and limiting multi-task general￾ization. tasks (Qin et al., 2023; Raffel et al., 2020; Chen et al., 2025b), yet inevitably im… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our SAMoRA. We design a Semantic-Aware Router and a Task-Adaptive Scaling mechanism, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PCA visualization of expert features extracted [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of task scaling factors across [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of expert activation patterns on the unseen MMLU benchmark. The top row (MLP Router) [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity Analysis on hyperparameters evalu [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
read the original abstract

The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match input semantics with expert capabilities, leading to weak expert specialization. (2)Uniform weight fusion strategies struggle to provide adaptive update strengths, overlooking the varying complexity of different tasks. To address these limitations, we propose SAMoRA (Semantic-Aware Mixture of LoRA Experts), a novel parameter-efficient fine-tuning framework tailored for task-adaptive learning. Specifically, A Semantic-Aware Router is proposed to explicitly align textual semantics with the most suitable experts for precise routing. A Task-Adaptive Scaling mechanism is designed to regulate expert contributions based on specific task requirements dynamically. In addition, a novel regularization objective is proposed to jointly promote expert specialization and effective scaling. Extensive experiments on multiple multi-task benchmarks demonstrate that SAMoRA significantly outperforms the state-of-the-art methods and holds excellent task generalization capabilities. Code is available at https://github.com/boyan-code/SAMoRA

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces SAMoRA, a parameter-efficient fine-tuning framework for large language models that extends Mixture-of-Experts LoRA. It proposes a Semantic-Aware Router to align input textual semantics with suitable LoRA experts, a Task-Adaptive Scaling mechanism to dynamically adjust expert contributions according to task requirements, and a joint regularization objective to encourage expert specialization and effective scaling. The central claim is that extensive experiments on multiple multi-task benchmarks demonstrate significant outperformance over state-of-the-art methods together with strong task generalization.

Significance. If the reported empirical gains hold under rigorous evaluation, the work would advance parameter-efficient multi-task adaptation of LLMs by addressing imprecise routing and non-adaptive fusion in prior MoE-LoRA approaches. The public code release supports reproducibility and is a clear strength.

minor comments (3)
  1. Abstract: missing space after the numeral in “(1)Imprecise Routing” and “(2)Uniform weight fusion”.
  2. The experimental claims of “significantly outperforms” and “excellent task generalization” would benefit from explicit reference to the specific tables or figures that report the quantitative deltas, standard deviations, and statistical tests.
  3. Notation for the router and scaling functions could be introduced with a single consolidated equation block rather than scattered inline definitions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of SAMoRA and for recommending minor revision. We appreciate the recognition that our semantic-aware routing, task-adaptive scaling, and joint regularization address key limitations in prior MoE-LoRA methods, and we value the note on reproducibility via the public code release.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces SAMoRA as a new PEFT framework with three explicitly defined components: a Semantic-Aware Router for matching semantics to experts, a Task-Adaptive Scaling mechanism, and a joint regularization objective. These are architectural and training choices, not derived quantities. The central claim of superior performance rests on empirical results across multi-task benchmarks rather than any equation or prediction that reduces by construction to fitted parameters, self-citations, or renamed inputs. No load-bearing derivation step collapses to a tautology or self-referential fit.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard neural-network training assumptions plus two domain-specific premises about semantic matching and task complexity; no new physical entities are postulated.

free parameters (2)
  • Number of LoRA experts
    Hyperparameter that must be chosen; affects routing granularity and is not derived from first principles.
  • Regularization coefficients
    Weights balancing specialization and scaling objectives; selected during training.
axioms (2)
  • domain assumption Textual semantics can be extracted by a lightweight router network and used to select experts accurately
    Invoked in the description of the Semantic-Aware Router.
  • domain assumption Task complexity varies in a way that can be captured by a dynamic scaling factor
    Underlying the Task-Adaptive Scaling mechanism.

pith-pipeline@v0.9.0 · 5527 in / 1376 out tokens · 48191 ms · 2026-05-10T02:23:54.227244+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 30 canonical work pages · 8 internal anchors

  1. [1]

    Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

    Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

  2. [2]

    Classification Problem Solving

    Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

  3. [3]

    , title =

    Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

  4. [4]

    New Ways to Make Microcircuits Smaller---Duplicate Entry

    Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

  5. [5]

    Clancey and Glenn Rennels , abstract =

    Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

  6. [6]

    and Rennels, Glenn R

    Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

  7. [7]

    Poligon: A System for Parallel Problem Solving

    Rice, James. Poligon: A System for Parallel Problem Solving

  8. [8]

    Transfer of Rule-Based Expertise through a Tutorial Dialogue

    Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

  9. [9]

    The Engineering of Qualitative Models

    Clancey, William J. The Engineering of Qualitative Models

  10. [10]

    2017 , eprint=

    Attention Is All You Need , author=. 2017 , eprint=

  11. [11]

    Pluto: The 'Other' Red Planet

    NASA. Pluto: The 'Other' Red Planet

  12. [12]

    doi:10.1162/neco.1991.3.1.79

    Robert A. Jacobs and Michael I. Jordan and Steven J. Nowlan and Geoffrey E. Hinton , title =. Neural Comput. , volume =. 1991 , url =. doi:10.1162/NECO.1991.3.1.79 , timestamp =

  13. [13]

    Le and Geoffrey E

    Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. 5th International Conference on Learning Representations,. 2017 , url =

  14. [14]

    9th International Conference on Learning Representations,

    Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen , title =. 9th International Conference on Learning Representations,. 2021 , url =

  15. [15]

    William Fedus and Barret Zoph and Noam Shazeer , title =. J. Mach. Learn. Res. , volume =. 2022 , url =

  16. [16]

    Mohammed Muqeeth and Haokun Liu and Colin Raffel , title =. Trans. Mach. Learn. Res. , volume =. 2024 , url =

  17. [17]

    Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference , booktitle =

    Sneha Kudugunta and Yanping Huang and Ankur Bapna and Maxim Krikun and Dmitry Lepikhin and Minh. Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference , booktitle =. 2021 , url =. doi:10.18653/V1/2021.FINDINGS-EMNLP.304 , timestamp =

  18. [18]

    CoRR , volume =

    Jingwei Xu and Junyu Lai and Yunpeng Huang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2405.13053 , eprinttype =. 2405.13053 , timestamp =

  19. [19]

    arXiv preprint arXiv:2307.13269 , year=

    Chengsong Huang and Qian Liu and Bill Yuchen Lin and Tianyu Pang and Chao Du and Min Lin , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2307.13269 , eprinttype =. 2307.13269 , timestamp =

  20. [20]

    Mixture-of-LoRAs: An Efficient Multitask Tuning Method for Large Language Models , booktitle =

    Wenfeng Feng and Chuzhan Hao and Yuewei Zhang and Yu Han and Hao Wang , editor =. Mixture-of-LoRAs: An Efficient Multitask Tuning Method for Large Language Models , booktitle =. 2024 , url =

  21. [21]

    Qidong Liu and Xian Wu and Xiangyu Zhao and Yuanshao Zhu and Derong Xu and Feng Tian and Yefeng Zheng , editor =. When. Proceedings of the 47th International. 2024 , url =. doi:10.1145/3626772.3657722 , timestamp =

  22. [22]

    HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning , booktitle =

    Chunlin Tian and Zhan Shi and Zhijiang Guo and Li Li and Cheng. HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning , booktitle =. 2024 , url =

  23. [23]

    MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning , booktitle =

    Yaming Yang and Dilxat Muhtar and Yelong Shen and Yuefeng Zhan and Jianfeng Liu and Yujing Wang and Hao Sun and Weiwei Deng and Feng Sun and Qi Zhang and Weizhu Chen and Yunhai Tong , editor =. MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning , booktitle =. 2025 , url =. doi:10.1609/AAAI.V39I20.35509 , timestamp =

  24. [24]

    2024 , url =

    Xiao Liu and Yanan Zheng and Zhengxiao Du and Ming Ding and Yujie Qian and Zhilin Yang and Jie Tang , title =. 2024 , url =. doi:10.1016/J.AIOPEN.2023.08.012 , timestamp =

  25. [25]

    Liu , title =

    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. J. Mach. Learn. Res. , volume =. 2020 , url =

  26. [26]

    Scaling Laws for Neural Language Models

    Jared Kaplan and Sam McCandlish and Tom Henighan and Tom B. Brown and Benjamin Chess and Rewon Child and Scott Gray and Alec Radford and Jeffrey Wu and Dario Amodei , title =. CoRR , volume =. 2020 , url =. 2001.08361 , timestamp =

  27. [27]

    Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , booktitle =

    Chengwei Qin and Aston Zhang and Zhuosheng Zhang and Jiaao Chen and Michihiro Yasunaga and Diyi Yang , editor =. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.85 , timestamp =

  28. [28]

    Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous comput- ing,

    Xiaozhe Ren and Pingyi Zhou and Xinfan Meng and Xinjing Huang and Yadao Wang and Weichao Wang and Pengfei Li and Xiaoda Zhang and Alexander Podolskiy and Grigory Arshinov and Andrey Bout and Irina Piontkovskaya and Jiansheng Wei and Xin Jiang and Teng Su and Qun Liu and Jun Yao , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2303.10845 , epri...

  29. [29]

    Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M

    Jason Wei and Maarten Bosma and Vincent Y. Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

  30. [30]

    Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

    Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , booktitle =. 2022 , url =

  31. [31]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord , title =. CoRR , volume =. 2018 , url =. 1803.05457 , timestamp =

  32. [32]

    Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

    Todor Mihaylov and Peter Clark and Tushar Khot and Ashish Sabharwal , title =. CoRR , volume =. 2018 , url =. 1809.02789 , timestamp =

  33. [33]

    The Thirty-Fourth

    Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi , title =. The Thirty-Fourth. 2020 , url =. doi:10.1609/AAAI.V34I05.6239 , timestamp =

  34. [34]

    SocialIQA: Commonsense Reasoning about Social Interactions

    Maarten Sap and Hannah Rashkin and Derek Chen and Ronan Le Bras and Yejin Choi , title =. CoRR , volume =. 2019 , url =. 1904.09728 , timestamp =

  35. [35]

    Bowman , title =

    Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. 7th International Conference on Learning Representations,. 2019 , url =

  36. [36]

    MMLU-Pro:

    Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen , editor =. MMLU-Pro:. Advances in Neural Information Processing Systems 38: Annual Conference on Neura...

  37. [37]

    DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =

    Shih. DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =. 2024 , url =

  38. [38]

    PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models , booktitle =

    Fanxu Meng and Zhaohui Wang and Muhan Zhang , editor =. PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models , booktitle =. 2024 , url =

  39. [39]

    Psychometrika , volume=

    The approximation of one matrix by another of lower rank , author=. Psychometrika , volume=. 1936 , publisher=

  40. [40]

    Bowman , editor =

    Alex Wang and Yada Pruksachatkun and Nikita Nangia and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , editor =. SuperGLUE:. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada , pages =. 2019 , url =

  41. [41]

    Proceedings of the 57th Conference of the Association for Computational Linguistics,

    Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi , editor =. HellaSwag: Can a Machine Really Finish Your Sentence? , booktitle =. 2019 , url =. doi:10.18653/V1/P19-1472 , timestamp =

  42. [42]

    Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi , title =. Commun. 2021 , url =. doi:10.1145/3474381 , timestamp =

  43. [43]

    doi: 10.18653/v1/N19-1421

    Alon Talmor and Jonathan Herzig and Nicholas Lourie and Jonathan Berant , editor =. CommonsenseQA:. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,. 2019 , url =. doi:10.18653/V1/N19-1421 , timestamp =

  44. [44]

    Learning multiple visual domains with residual adapters , booktitle =

    Sylvestre. Learning multiple visual domains with residual adapters , booktitle =. 2017 , url =

  45. [45]

    A Survey of Large Language Models

    Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journ...

  46. [46]

    Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

    Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment , author=. arXiv preprint arXiv:2312.12148 , year=

  47. [47]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  48. [48]

    arXiv preprint arXiv:2506.14436 , year=

    Shen Yuan and Yin Zheng and Taifeng Wang and Binbin Liu and Hongteng Xu , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.14436 , eprinttype =. 2506.14436 , timestamp =

  49. [49]

    The Thirteenth International Conference on Learning Representations,

    Fan Wang and Juyong Jiang and Chansung Park and Sunghun Kim and Jing Tang , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

  50. [50]

    arXiv preprint arXiv:2311.11501 , year=

    Yiming Wang and Yu Lin and Xiaodong Zeng and Guannan Zhang , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2311.11501 , eprinttype =. 2311.11501 , timestamp =

  51. [51]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov and Soumya Batra and Prajjwal Bhargava and Shruti Bhosale and Dan Bikel and Lukas Blecher and Cristian Canton. Llama 2: Open Foundation and Fine-Tuned Chat Models , journal =. 2023 , url =. doi:10.48550/ARXIV.2307.09288 , eprinttype ...

  52. [52]

    2023 , eprint=

    Mistral 7B , author=. 2023 , eprint=

  53. [53]

    2024 , eprint=

    DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models , author=. 2024 , eprint=

  54. [54]

    arXiv preprint arXiv:2506.14436 , year=

    MoORE: SVD-based Model MoE-ization for Conflict-and Oblivion-Resistant Multi-Task Adaptation , author=. arXiv preprint arXiv:2506.14436 , year=

  55. [55]

    Qwen3 Technical Report

    Qwen Team , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.09388 , eprinttype =. 2505.09388 , timestamp =

  56. [56]

    Commonsenseqa: A question answering challenge targeting commonsense knowledge , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

  57. [57]

    Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model , booktitle =

    Jiali Chen and Xusen Hei and Yuqi Xue and Zihan Wu and Jiayuan Xie and Yi Cai , editor =. Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model , booktitle =. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-NAACL.133 , timestamp =

  58. [58]

    ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments , booktitle =

    Jiali Chen and Yujie Jia and Zihan Wu and Jinyu Yang and Jianpeng Chen and Xusen Hei and Jiayuan Xie and Yi Cai and Qing Li , editor =. ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments , booktitle =. 2025 , url =. doi:10.1145/3746027.3755756 , timestamp =

  59. [59]

    Spatial-Temporal Knowledge Distillation for Takeaway Recommendation , booktitle =

    Shuyuan Zhao and Wei Chen and Boyan Shi and Liyong Zhou and Shuohao Lin and Huaiyu Wan , editor =. Spatial-Temporal Knowledge Distillation for Takeaway Recommendation , booktitle =. 2025 , url =. doi:10.1609/AAAI.V39I12.33459 , timestamp =

  60. [60]

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Wenyi Hong and Wenmeng Yu and Xiaotao Gu and Guo Wang and Guobing Gan and Haomiao Tang and Jiale Cheng and Ji Qi and Junhui Ji and Lihang Pan and Shuaiqi Duan and Weihan Wang and Yan Wang and Yean Cheng and Zehai He and Zhe Su and Zhen Yang and Ziyang Pan and Aohan Zeng and Baoxu Wang and Boyan Shi and Changyu Pang and Chenhui Zhang and Da Yin and Fan Yan...

  61. [61]

    The Llama 3 Herd of Models

    Llama Team , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2407.21783 , eprinttype =. 2407.21783 , timestamp =

  62. [62]

    The Thirteenth International Conference on Learning Representations,

    Mengqi Liao and Wei Chen and Junfeng Shen and Shengnan Guo and Huaiyu Wan , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =