pith. sign in

arxiv: 2604.10603 · v1 · submitted 2026-04-12 · 💻 cs.LG · cs.AI· cs.PF

MoEITS: A Green AI approach for simplifying MoE-LLMs

Pith reviewed 2026-05-10 16:05 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.PF
keywords MoELLM pruninginformation theorymodel compressionefficient AImixture of expertsgreen AI
0
0 comments X

The pith

MoEITS prunes experts in Mixture-of-Experts LLMs using information theory to create smaller, equally accurate models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MoEITS as a new algorithm to simplify large language models that use mixture of experts. It applies standardized information theoretic measures to decide which experts can be removed, thereby lowering the computational demands and energy use. Through tests on models including Mixtral 8x7B, the method produces versions that perform well on benchmarks and use fewer resources than models simplified by other current techniques. Readers would care if they want powerful AI systems that run efficiently on available hardware. The work includes both theoretical analysis of the algorithm's complexity and practical comparisons.

Core claim

MoEITS identifies redundant experts in MoE-based large language models via information-theoretic criteria and removes them, resulting in simplified models that retain effectiveness on all tested benchmarks while achieving greater computational efficiency than existing pruning approaches.

What carries the argument

The MoEITS algorithm, which ranks experts by their information contribution using standardized measures and prunes those with the lowest scores.

If this is right

  • Simplified MoE-LLMs require less memory storage and faster inference times.
  • Energy consumption during model use decreases, aligning with green AI goals.
  • The pruning process itself has low computational complexity.
  • Results hold across different base models such as Qwen and DeepSeek variants.
  • Models remain effective without needing retraining after pruning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar information measures could guide pruning in non-MoE transformer architectures.
  • Integrating MoEITS into training loops might prevent over-provisioning of experts from the start.
  • Deployments on resource-limited devices become more feasible for high-performing models.
  • Future work could explore combining this with other compression methods like distillation.

Load-bearing premise

Standardized information-theoretic measures can accurately identify experts that are dispensable without harming the model's ability to handle new or unseen tasks.

What would settle it

Running the simplified models on a completely new benchmark or domain and observing a significant drop in performance compared to the original model.

Figures

Figures reproduced from arXiv: 2604.10603 by Jos\'e M. Ben\'itez, Luis Balderas, Miguel Lastra.

Figure 1
Figure 1. Figure 1: Diagram of MoEITS. Starting from a block of model experts, the redundancy analysis metric [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the simplified Qwen1.5-2.7B model with MoEITS for different values of [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
read the original abstract

Large language models are transforming all areas of academia and industry, attracting the attention of researchers, professionals, and the general public. In the trek for more powerful architectures, Mixture-of-Experts, inspired by ensemble models, have emerged as one of the most effective ways to follow. However, this implies a high computational burden for both training and inference. To reduce the impact on computing and memory footprint as well as the energy consumption, simplification methods has arisen as very effective procedures. In this paper, an original algorithm, MoEITS, for MoE-LLMs simplification is presented. The algorithm is characterized by a refined simplicity, underpinned by standardized Information Theoretic frameworks. MoEITS is analyzed in depth from theoretical and practical points of view. Its computational complexity is studied. Its performance on the accuracy of the simplified LLMs and the reduction rate achieved is assessed through a thoroughly designed experimentation. This empirical evaluation includes a comparison with state-of-the-art MoE-LLM pruning methods applied on Mixtral $8\times7$B, Qwen1.5-2.7B, and DeepSeek-V2-Lite. The extensive experimentation conducted demonstrates that MoEITS outperforms state-of-the-art techniques by generating models that are both effective across all benchmarks and computationally efficient. The code implementing the method will be available at https://github.com/luisbalru/MoEITS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MoEITS, an algorithm for simplifying Mixture-of-Experts LLMs via standardized information-theoretic measures (entropy, mutual information, etc.) to prune experts. It provides a theoretical analysis of the method, studies its computational complexity, and reports empirical results claiming that the resulting models outperform prior state-of-the-art pruning techniques in both accuracy and efficiency on standard benchmarks. Experiments are conducted on Mixtral 8×7B, Qwen1.5-2.7B, and DeepSeek-V2-Lite, with code promised for release.

Significance. If the central claims hold, the work offers a principled, information-theoretic route to reducing the inference cost and energy footprint of MoE architectures without sacrificing benchmark performance. The explicit complexity analysis and promised open-source implementation are positive features that would aid reproducibility and adoption in green-AI research.

major comments (2)
  1. Experimental evaluation (throughout the empirical sections): the reported benchmarks are confined to standard in-distribution test sets used by prior pruning methods. No out-of-distribution, domain-shift, or cross-task splits are presented, leaving open the possibility that the IT-based importance scores primarily capture in-distribution activation frequency rather than intrinsic expert utility. This directly affects the load-bearing claim that pruned models remain “effective across all benchmarks.”
  2. Section describing the pruning criterion: the paper asserts that standardized IT measures reliably identify removable experts, yet provides no ablation showing how these scores behave when the input distribution is altered (e.g., via synthetic distribution shifts or held-out domains). Without such evidence, the superiority over baselines may be an artifact of the evaluation distribution.
minor comments (2)
  1. The abstract and introduction repeatedly use “simplification” and “pruning” interchangeably; a brief clarifying sentence on the precise relationship would improve readability.
  2. Figure captions and table headers should explicitly state the number of runs and any statistical significance tests performed, rather than reporting single-point metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The concerns about the scope of our experimental evaluation and the need for robustness checks under distribution shifts are well-taken. We address each major comment point by point below and will make partial revisions to incorporate clarifications and a limitations discussion in the revised version.

read point-by-point responses
  1. Referee: Experimental evaluation (throughout the empirical sections): the reported benchmarks are confined to standard in-distribution test sets used by prior pruning methods. No out-of-distribution, domain-shift, or cross-task splits are presented, leaving open the possibility that the IT-based importance scores primarily capture in-distribution activation frequency rather than intrinsic expert utility. This directly affects the load-bearing claim that pruned models remain “effective across all benchmarks.”

    Authors: We acknowledge that our experiments follow the standard in-distribution benchmarks used by prior MoE pruning methods to ensure fair and reproducible comparisons. The information-theoretic measures in MoEITS are computed from activation statistics and aim to quantify each expert's contribution to the output distribution, which we posit goes beyond mere frequency. Nevertheless, we agree that explicit OOD or domain-shift evaluations would provide stronger evidence for generalization. In the revised manuscript, we will add a dedicated paragraph in the Discussion section noting this limitation, clarifying that our claims are scoped to the evaluated benchmarks, and outlining future work on cross-domain testing. revision: partial

  2. Referee: Section describing the pruning criterion: the paper asserts that standardized IT measures reliably identify removable experts, yet provides no ablation showing how these scores behave when the input distribution is altered (e.g., via synthetic distribution shifts or held-out domains). Without such evidence, the superiority over baselines may be an artifact of the evaluation distribution.

    Authors: The pruning criterion relies on standardized entropy and mutual information computed over observed expert activations for the given inputs, providing a principled ranking independent of any single baseline. We do not claim the scores are invariant to arbitrary shifts, but the theoretical analysis in the paper supports their use for identifying low-utility experts. To address the concern, we will revise the method section to include a short explanation of how the measures are expected to behave under moderate distribution changes and explicitly flag comprehensive shift ablations as future work. revision: partial

Circularity Check

0 steps flagged

MoEITS derivation is self-contained with no load-bearing circular steps

full rationale

The paper introduces MoEITS as a new pruning algorithm derived from standard information-theoretic measures (entropy, mutual information) applied to MoE expert selection. No equations or steps reduce the claimed performance gains to fitted parameters renamed as predictions, self-definitional loops, or self-citation chains; the method is presented as original and evaluated via direct comparison to external SOTA baselines on Mixtral, Qwen, and DeepSeek models. The central claims rest on empirical benchmarking rather than tautological re-derivation of inputs, satisfying the criteria for an independent derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that information-theoretic quantities can serve as reliable proxies for expert utility in MoE architectures; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Information-theoretic measures can effectively quantify the contribution of individual experts in MoE-LLMs for pruning decisions
    The method is characterized by reliance on standardized Information Theoretic frameworks to achieve simplification.

pith-pipeline@v0.9.0 · 5561 in / 1172 out tokens · 70063 ms · 2026-05-10T16:05:17.103332+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages

  1. [1]

    https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5

    teknium/OpenHermes-2.5 · Datasets at Hugging Face — huggingface.co. https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5. [Accessed 01-03-2026]. 16 MoEITS: A Green AI approach for simplifying MoE-LLMs

  2. [2]

    Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman

    Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. SliceGPT: Compress large language models by deleting rows and columns. InThe Twelfth International Conference on Learning Representations, 2024

  3. [3]

    MoEITS healing phase: OpenHermes 2.5 training subset

    Luis Balderas. MoEITS healing phase: OpenHermes 2.5 training subset. https://zenodo.org/records/19535551, 2026

  4. [4]

    Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025

    Wuzhida Bao, Yuting Cao, Yin Yang, Hangjun Che, Junjian Huang, and Shiping Wen. Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025

  5. [5]

    A survey on mixture of experts, 2024

    Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, and Jiayi Huang. A survey on mixture of experts, 2024

  6. [6]

    EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models

    Yuanteng Chen, Yuantian Shao, Peisong Wang, and Jian Cheng. EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1...

  7. [7]

    Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025

    Jian Cheng, Haidong Kang, Yuxin Shao, Nan Li, Pengjun Chen, Rui Wang, Saiqin Long, Xiaochun Yang, and Lianbo Ma. Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025

  8. [8]

    A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts

    Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, and Christopher Carothers. A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings o...

  9. [9]

    BoolQ: Exploring the surprising difficulty of natural yes/no questions

    Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Huma...

  10. [10]

    Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

    Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

  11. [11]

    Cover and Joy A

    Thomas M. Cover and Joy A. Thomas.Elements of Information Theory (Wiley Series in Telecommuni- cations and Signal Processing). Wiley-Interscience, USA, 2006

  12. [12]

    Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y . Wu, Zhenda Xie, Y . K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models, 2024

  13. [13]

    How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011

    Robert Dawson. How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011. 17 MoEITS: A Green AI approach for simplifying MoE-LLMs

  14. [14]

    Llm.int8(): 8-bit matrix multi- plication for transformers at scale

    Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Llm.int8(): 8-bit matrix multi- plication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc

  15. [15]

    Qlora: efficient finetuning of quantized llms

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: efficient finetuning of quantized llms. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

  16. [16]

    Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023

    Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023

  17. [17]

    Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper

    Yi Ding and Tianyao Shi. Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper. In2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC), pages 37–38, 2024

  18. [18]

    Tyrone E. Duncan. On the calculation of mutual information.SIAM Journal on Applied Mathematics, 19(1):215–220, 1970

  19. [19]

    Cruz, and George D.C

    Faramarz Farhangian, Rafael M.O. Cruz, and George D.C. Cavalcanti. Fake news detection: Taxonomy and comparative study.Information Fusion, 103:102140, 2024

  20. [20]

    Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J. Mach. Learn. Res., 23(1), January 2022

  21. [21]

    DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts

    Yuchen Feng, Bowen Shen, Naibin Gu, Jiaxuan Zhao, Peng Fu, Zheng Lin, and Weiping Wang. DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Proceedings of the 63rd Annual Meeting of the Association for Computa...

  22. [22]

    The language model evaluation harness, 07 2024

    Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The languag...

  23. [23]

    Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025

    Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zhen Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Lou Qian, Xu Jie, and Yen-Chang Hsu. Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025

  24. [24]

    Polarquant: Quantizing kv caches with polar transformation, 2025

    Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh. Polarquant: Quantizing kv caches with polar transformation, 2025

  25. [25]

    A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

    Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

  26. [26]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. 18 MoEITS: A Green AI approach for simplifying MoE-LLMs

  27. [27]

    Wentao Hu, Mingkuan Zhao, Shuangyong Song, Xiaoyan Zhu, Xin Lai, and Jiayin Wang. Mosaic pruning: A hierarchical framework for generalizable pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelligence, 40(26):21885–21893, Mar. 2026

  28. [28]

    MC-moe: Mixture compressor for mixture-of-experts LLMs gains more

    Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, and XIAOJUAN QI. MC-moe: Mixture compressor for mixture-of-experts LLMs gains more. InThe Thirteenth International Conference on Learning Representations, 2025

  29. [29]

    Billm: pushing the limit of post-training quantization for llms

    Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xiaojuan Qi. Billm: pushing the limit of post-training quantization for llms. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

  30. [30]

    How good are low-bit quantized llama3 models? an empirical study,

    Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, and Michele Magno. How good are low-bit quantized llama3 models? an empirical study.CoRR, abs/2404.14047, 2024

  31. [31]

    Farhan Ishmam, Md

    Md. Farhan Ishmam, Md. Sakib Hossain Shovon, M.F. Mridha, and Nilanjan Dey. From image to lan- guage: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities. Information Fusion, 106:102270, 2024

  32. [32]

    Jacobs, Michael I

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

  33. [33]

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven...

  34. [34]

    Kvålseth

    Tarald O. Kvålseth. On normalized mutual information: Measure derivations and properties.Entropy, 19(11), 2017

  35. [35]

    STUN: Structured-then-unstructured pruning for scalable MoE pruning

    Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, and Yuxiong He. STUN: Structured-then-unstructured pruning for scalable MoE pruning. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

  36. [36]

    Lin Li, Yan Wang, and Zhuopeng Wang. C-gnn-prune: A unified graph-based framework for structure- aware pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelli- gence, 40(27):22976–22984, Mar. 2026

  37. [37]

    ApiQ: Finetuning of 2-bit quantized large language model

    Baohao Liao, Christian Herold, Shahram Khadivi, and Christof Monz. ApiQ: Finetuning of 2-bit quantized large language model. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20996–21020, Miami, Florida, USA, November 2024. Association for Computatio...

  38. [38]

    Awq: Activation-aware weight quantization for on-device llm compression and acceleration

    Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. In P. Gibbons, G. Pekhimenko, and C. De Sa, editors,Proceedings of Machine Learning and Systems, volume 6, pages 87–100, 2024. 19 MoEI...

  39. [39]

    Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024

    Zhiwei Liu, Tianlin Zhang, Kailai Yang, Paul Thompson, Zeping Yu, and Sophia Ananiadou. Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024

  40. [40]

    Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models

    Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, and Hongsheng Li. Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volu...

  41. [41]

    LLM-pruner: On the structural pruning of large language models

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. LLM-pruner: On the structural pruning of large language models. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  42. [42]

    Llm-pruner: On the structural pruning of large language models, 2023

    Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models, 2023

  43. [43]

    Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025

    Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, and Bei Yu. Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025

  44. [44]

    Winogrande: an adversarial winograd schema challenge at scale.Commun

    Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106, August 2021

  45. [45]

    Smith, and Oren Etzioni

    Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. Green ai.Commun. ACM, 63(12):54–63, November 2020

  46. [46]

    C. Shannon. The lattice theory of information.Transactions of the IRE Professional Group on Information Theory, 1(1):105–107, 1953

  47. [47]

    C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379– 423, 1948

  48. [48]

    Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

    Noam Shazeer, *Azalia Mirhoseini, *Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017

  49. [49]

    Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002

    Alexander Strehl and Joydeep Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002

  50. [50]

    Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024

    Qwen Team. Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024

  51. [51]

    Llama 2: Open foundation and fine-tuned chat models, 2023

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Can- ton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Har...

  52. [52]

    Carbon footprint evaluation of code generation through llm as a service

    Tina Vartziotis, Maximilian Schmidt, George Dasoulas, Ippolyti Dellatolas, Stefano Attademo, Viet Dung Le, Anke Wiechmann, Tim Hoffmann, Michael Keckeisen, and Sotirios Kotsopoulos. Carbon footprint evaluation of code generation through llm as a service. In André Casal Kulzer, Hans- Christian Reuss, and Andreas Wagner, editors,2024 Stuttgart International...

  53. [53]

    A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023

    Roberto Verdecchia, June Sallou, and Luís Cruz. A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023

  54. [54]

    H. P. Vinutha, B. Poornima, and B. M. Sagar. Detection of outliers using interquartile range technique from intrusion dataset. In Suresh Chandra Satapathy, Joao Manuel R.S. Tavares, Vikrant Bhateja, and J. R. Mohanty, editors,Information and Decision Sciences, pages 511–518, Singapore, 2018. Springer Singapore

  55. [55]

    SmoothQuant: Accurate and efficient post-training quantization for large language models

    Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. SmoothQuant: Accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, 2023

  56. [56]

    Smoothquant: accurate and efficient post-training quantization for large language models

    Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

  57. [57]

    Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024

    Zhenxiang Xiao, Yuzhong Chen, Junjie Yao, Lu Zhang, Zhengliang Liu, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Dezhong Yao, Tianming Liu, and Xi Jiang. Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024

  58. [58]

    Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024

    Yanyue Xie, Zhi Zhang, Ding Zhou, Cong Xie, Ziang Song, Xin Liu, Yanzhi Wang, Xue Lin, and An Xu. Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024

  59. [59]

    MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition

    Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Yuanlin Duan, Wenqi Jia, Miao Yin, Yu Cheng, and Bo Yuan. MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP...

  60. [60]

    PB-LLM: Partially binarized large language models

    Zhihang Yuan, Yuzhang Shang, and Zhen Dong. PB-LLM: Partially binarized large language models. InThe Twelfth International Conference on Learning Representations, 2024

  61. [61]

    Turboquant: Online vector quantization with near-optimal distortion rate, 2025

    Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate, 2025

  62. [62]

    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really finish your sentence? In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy, July 2019. Association for Computational Linguistics

  63. [63]

    Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024

    Kunpeng Zhang, Feng Zhou, Lan Wu, Na Xie, and Zhengbing He. Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024

  64. [64]

    Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024

    Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao. Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024. 21 MoEITS: A Green AI approach for simplifying MoE-LLMs

  65. [65]

    Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs

    Yixiao Zhou, Ziyu Zhao, Dongzhou Cheng, Zhiliang Wu, Jie Gui, Yi Yang, Fei Wu, Yu Cheng, and Hehe Fan. Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNL...

  66. [66]

    Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025

    Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, and Yuxuan Liang. Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025. 22