MoEITS: A Green AI approach for simplifying MoE-LLMs
Pith reviewed 2026-05-10 16:05 UTC · model grok-4.3
The pith
MoEITS prunes experts in Mixture-of-Experts LLMs using information theory to create smaller, equally accurate models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MoEITS identifies redundant experts in MoE-based large language models via information-theoretic criteria and removes them, resulting in simplified models that retain effectiveness on all tested benchmarks while achieving greater computational efficiency than existing pruning approaches.
What carries the argument
The MoEITS algorithm, which ranks experts by their information contribution using standardized measures and prunes those with the lowest scores.
If this is right
- Simplified MoE-LLMs require less memory storage and faster inference times.
- Energy consumption during model use decreases, aligning with green AI goals.
- The pruning process itself has low computational complexity.
- Results hold across different base models such as Qwen and DeepSeek variants.
- Models remain effective without needing retraining after pruning.
Where Pith is reading between the lines
- Similar information measures could guide pruning in non-MoE transformer architectures.
- Integrating MoEITS into training loops might prevent over-provisioning of experts from the start.
- Deployments on resource-limited devices become more feasible for high-performing models.
- Future work could explore combining this with other compression methods like distillation.
Load-bearing premise
Standardized information-theoretic measures can accurately identify experts that are dispensable without harming the model's ability to handle new or unseen tasks.
What would settle it
Running the simplified models on a completely new benchmark or domain and observing a significant drop in performance compared to the original model.
Figures
read the original abstract
Large language models are transforming all areas of academia and industry, attracting the attention of researchers, professionals, and the general public. In the trek for more powerful architectures, Mixture-of-Experts, inspired by ensemble models, have emerged as one of the most effective ways to follow. However, this implies a high computational burden for both training and inference. To reduce the impact on computing and memory footprint as well as the energy consumption, simplification methods has arisen as very effective procedures. In this paper, an original algorithm, MoEITS, for MoE-LLMs simplification is presented. The algorithm is characterized by a refined simplicity, underpinned by standardized Information Theoretic frameworks. MoEITS is analyzed in depth from theoretical and practical points of view. Its computational complexity is studied. Its performance on the accuracy of the simplified LLMs and the reduction rate achieved is assessed through a thoroughly designed experimentation. This empirical evaluation includes a comparison with state-of-the-art MoE-LLM pruning methods applied on Mixtral $8\times7$B, Qwen1.5-2.7B, and DeepSeek-V2-Lite. The extensive experimentation conducted demonstrates that MoEITS outperforms state-of-the-art techniques by generating models that are both effective across all benchmarks and computationally efficient. The code implementing the method will be available at https://github.com/luisbalru/MoEITS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MoEITS, an algorithm for simplifying Mixture-of-Experts LLMs via standardized information-theoretic measures (entropy, mutual information, etc.) to prune experts. It provides a theoretical analysis of the method, studies its computational complexity, and reports empirical results claiming that the resulting models outperform prior state-of-the-art pruning techniques in both accuracy and efficiency on standard benchmarks. Experiments are conducted on Mixtral 8×7B, Qwen1.5-2.7B, and DeepSeek-V2-Lite, with code promised for release.
Significance. If the central claims hold, the work offers a principled, information-theoretic route to reducing the inference cost and energy footprint of MoE architectures without sacrificing benchmark performance. The explicit complexity analysis and promised open-source implementation are positive features that would aid reproducibility and adoption in green-AI research.
major comments (2)
- Experimental evaluation (throughout the empirical sections): the reported benchmarks are confined to standard in-distribution test sets used by prior pruning methods. No out-of-distribution, domain-shift, or cross-task splits are presented, leaving open the possibility that the IT-based importance scores primarily capture in-distribution activation frequency rather than intrinsic expert utility. This directly affects the load-bearing claim that pruned models remain “effective across all benchmarks.”
- Section describing the pruning criterion: the paper asserts that standardized IT measures reliably identify removable experts, yet provides no ablation showing how these scores behave when the input distribution is altered (e.g., via synthetic distribution shifts or held-out domains). Without such evidence, the superiority over baselines may be an artifact of the evaluation distribution.
minor comments (2)
- The abstract and introduction repeatedly use “simplification” and “pruning” interchangeably; a brief clarifying sentence on the precise relationship would improve readability.
- Figure captions and table headers should explicitly state the number of runs and any statistical significance tests performed, rather than reporting single-point metrics.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The concerns about the scope of our experimental evaluation and the need for robustness checks under distribution shifts are well-taken. We address each major comment point by point below and will make partial revisions to incorporate clarifications and a limitations discussion in the revised version.
read point-by-point responses
-
Referee: Experimental evaluation (throughout the empirical sections): the reported benchmarks are confined to standard in-distribution test sets used by prior pruning methods. No out-of-distribution, domain-shift, or cross-task splits are presented, leaving open the possibility that the IT-based importance scores primarily capture in-distribution activation frequency rather than intrinsic expert utility. This directly affects the load-bearing claim that pruned models remain “effective across all benchmarks.”
Authors: We acknowledge that our experiments follow the standard in-distribution benchmarks used by prior MoE pruning methods to ensure fair and reproducible comparisons. The information-theoretic measures in MoEITS are computed from activation statistics and aim to quantify each expert's contribution to the output distribution, which we posit goes beyond mere frequency. Nevertheless, we agree that explicit OOD or domain-shift evaluations would provide stronger evidence for generalization. In the revised manuscript, we will add a dedicated paragraph in the Discussion section noting this limitation, clarifying that our claims are scoped to the evaluated benchmarks, and outlining future work on cross-domain testing. revision: partial
-
Referee: Section describing the pruning criterion: the paper asserts that standardized IT measures reliably identify removable experts, yet provides no ablation showing how these scores behave when the input distribution is altered (e.g., via synthetic distribution shifts or held-out domains). Without such evidence, the superiority over baselines may be an artifact of the evaluation distribution.
Authors: The pruning criterion relies on standardized entropy and mutual information computed over observed expert activations for the given inputs, providing a principled ranking independent of any single baseline. We do not claim the scores are invariant to arbitrary shifts, but the theoretical analysis in the paper supports their use for identifying low-utility experts. To address the concern, we will revise the method section to include a short explanation of how the measures are expected to behave under moderate distribution changes and explicitly flag comprehensive shift ablations as future work. revision: partial
Circularity Check
MoEITS derivation is self-contained with no load-bearing circular steps
full rationale
The paper introduces MoEITS as a new pruning algorithm derived from standard information-theoretic measures (entropy, mutual information) applied to MoE expert selection. No equations or steps reduce the claimed performance gains to fitted parameters renamed as predictions, self-definitional loops, or self-citation chains; the method is presented as original and evaluated via direct comparison to external SOTA baselines on Mixtral, Qwen, and DeepSeek models. The central claims rest on empirical benchmarking rather than tautological re-derivation of inputs, satisfying the criteria for an independent derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Information-theoretic measures can effectively quantify the contribution of individual experts in MoE-LLMs for pruning decisions
Reference graph
Works this paper leans on
-
[1]
https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5
teknium/OpenHermes-2.5 · Datasets at Hugging Face — huggingface.co. https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5. [Accessed 01-03-2026]. 16 MoEITS: A Green AI approach for simplifying MoE-LLMs
work page 2026
-
[2]
Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman
Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. SliceGPT: Compress large language models by deleting rows and columns. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[3]
MoEITS healing phase: OpenHermes 2.5 training subset
Luis Balderas. MoEITS healing phase: OpenHermes 2.5 training subset. https://zenodo.org/records/19535551, 2026
-
[4]
Wuzhida Bao, Yuting Cao, Yin Yang, Hangjun Che, Junjian Huang, and Shiping Wen. Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025
work page 2025
-
[5]
A survey on mixture of experts, 2024
Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, and Jiayi Huang. A survey on mixture of experts, 2024
work page 2024
-
[6]
EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models
Yuanteng Chen, Yuantian Shao, Peisong Wang, and Jian Cheng. EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1...
work page 2025
-
[7]
Jian Cheng, Haidong Kang, Yuxin Shao, Nan Li, Pengjun Chen, Rui Wang, Saiqin Long, Xiaochun Yang, and Lianbo Ma. Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025
work page 2025
-
[8]
A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts
Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, and Christopher Carothers. A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings o...
work page 2024
-
[9]
BoolQ: Exploring the surprising difficulty of natural yes/no questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Huma...
work page 2019
-
[10]
Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018
work page 2018
-
[11]
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory (Wiley Series in Telecommuni- cations and Signal Processing). Wiley-Interscience, USA, 2006
work page 2006
-
[12]
Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y . Wu, Zhenda Xie, Y . K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models, 2024
work page 2024
-
[13]
How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011
Robert Dawson. How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011. 17 MoEITS: A Green AI approach for simplifying MoE-LLMs
work page 2011
-
[14]
Llm.int8(): 8-bit matrix multi- plication for transformers at scale
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Llm.int8(): 8-bit matrix multi- plication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc
work page 2022
-
[15]
Qlora: efficient finetuning of quantized llms
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: efficient finetuning of quantized llms. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc
work page 2023
-
[16]
Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023
Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023
work page 2023
-
[17]
Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper
Yi Ding and Tianyao Shi. Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper. In2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC), pages 37–38, 2024
work page 2024
-
[18]
Tyrone E. Duncan. On the calculation of mutual information.SIAM Journal on Applied Mathematics, 19(1):215–220, 1970
work page 1970
-
[19]
Faramarz Farhangian, Rafael M.O. Cruz, and George D.C. Cavalcanti. Fake news detection: Taxonomy and comparative study.Information Fusion, 103:102140, 2024
work page 2024
-
[20]
Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J. Mach. Learn. Res., 23(1), January 2022
work page 2022
-
[21]
Yuchen Feng, Bowen Shen, Naibin Gu, Jiaxuan Zhao, Peng Fu, Zheng Lin, and Weiping Wang. DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Proceedings of the 63rd Annual Meeting of the Association for Computa...
work page 2025
-
[22]
The language model evaluation harness, 07 2024
Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The languag...
work page 2024
-
[23]
Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zhen Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Lou Qian, Xu Jie, and Yen-Chang Hsu. Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025
work page 2025
-
[24]
Polarquant: Quantizing kv caches with polar transformation, 2025
Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh. Polarquant: Quantizing kv caches with polar transformation, 2025
work page 2025
-
[25]
Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025
work page 2025
-
[26]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. 18 MoEITS: A Green AI approach for simplifying MoE-LLMs
work page 2022
-
[27]
Wentao Hu, Mingkuan Zhao, Shuangyong Song, Xiaoyan Zhu, Xin Lai, and Jiayin Wang. Mosaic pruning: A hierarchical framework for generalizable pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelligence, 40(26):21885–21893, Mar. 2026
work page 2026
-
[28]
MC-moe: Mixture compressor for mixture-of-experts LLMs gains more
Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, and XIAOJUAN QI. MC-moe: Mixture compressor for mixture-of-experts LLMs gains more. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[29]
Billm: pushing the limit of post-training quantization for llms
Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xiaojuan Qi. Billm: pushing the limit of post-training quantization for llms. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024
work page 2024
-
[30]
How good are low-bit quantized llama3 models? an empirical study,
Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, and Michele Magno. How good are low-bit quantized llama3 models? an empirical study.CoRR, abs/2404.14047, 2024
-
[31]
Md. Farhan Ishmam, Md. Sakib Hossain Shovon, M.F. Mridha, and Nilanjan Dey. From image to lan- guage: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities. Information Fusion, 106:102270, 2024
work page 2024
-
[32]
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991
work page 1991
-
[33]
Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven...
work page 2024
- [34]
-
[35]
STUN: Structured-then-unstructured pruning for scalable MoE pruning
Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, and Yuxiong He. STUN: Structured-then-unstructured pruning for scalable MoE pruning. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...
work page 2025
-
[36]
Lin Li, Yan Wang, and Zhuopeng Wang. C-gnn-prune: A unified graph-based framework for structure- aware pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelli- gence, 40(27):22976–22984, Mar. 2026
work page 2026
-
[37]
ApiQ: Finetuning of 2-bit quantized large language model
Baohao Liao, Christian Herold, Shahram Khadivi, and Christof Monz. ApiQ: Finetuning of 2-bit quantized large language model. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20996–21020, Miami, Florida, USA, November 2024. Association for Computatio...
work page 2024
-
[38]
Awq: Activation-aware weight quantization for on-device llm compression and acceleration
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. In P. Gibbons, G. Pekhimenko, and C. De Sa, editors,Proceedings of Machine Learning and Systems, volume 6, pages 87–100, 2024. 19 MoEI...
work page 2024
-
[39]
Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024
Zhiwei Liu, Tianlin Zhang, Kailai Yang, Paul Thompson, Zeping Yu, and Sophia Ananiadou. Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024
work page 2024
-
[40]
Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, and Hongsheng Li. Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volu...
work page 2024
-
[41]
LLM-pruner: On the structural pruning of large language models
Xinyin Ma, Gongfan Fang, and Xinchao Wang. LLM-pruner: On the structural pruning of large language models. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[42]
Llm-pruner: On the structural pruning of large language models, 2023
Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models, 2023
work page 2023
-
[43]
Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025
Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, and Bei Yu. Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025
work page 2025
-
[44]
Winogrande: an adversarial winograd schema challenge at scale.Commun
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106, August 2021
work page 2021
-
[45]
Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. Green ai.Commun. ACM, 63(12):54–63, November 2020
work page 2020
-
[46]
C. Shannon. The lattice theory of information.Transactions of the IRE Professional Group on Information Theory, 1(1):105–107, 1953
work page 1953
-
[47]
C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379– 423, 1948
work page 1948
-
[48]
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
Noam Shazeer, *Azalia Mirhoseini, *Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017
work page 2017
-
[49]
Alexander Strehl and Joydeep Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002
work page 2002
-
[50]
Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024
Qwen Team. Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024
work page 2024
-
[51]
Llama 2: Open foundation and fine-tuned chat models, 2023
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Can- ton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Har...
work page 2023
-
[52]
Carbon footprint evaluation of code generation through llm as a service
Tina Vartziotis, Maximilian Schmidt, George Dasoulas, Ippolyti Dellatolas, Stefano Attademo, Viet Dung Le, Anke Wiechmann, Tim Hoffmann, Michael Keckeisen, and Sotirios Kotsopoulos. Carbon footprint evaluation of code generation through llm as a service. In André Casal Kulzer, Hans- Christian Reuss, and Andreas Wagner, editors,2024 Stuttgart International...
work page 2024
-
[53]
A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023
Roberto Verdecchia, June Sallou, and Luís Cruz. A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023
work page 2023
-
[54]
H. P. Vinutha, B. Poornima, and B. M. Sagar. Detection of outliers using interquartile range technique from intrusion dataset. In Suresh Chandra Satapathy, Joao Manuel R.S. Tavares, Vikrant Bhateja, and J. R. Mohanty, editors,Information and Decision Sciences, pages 511–518, Singapore, 2018. Springer Singapore
work page 2018
-
[55]
SmoothQuant: Accurate and efficient post-training quantization for large language models
Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. SmoothQuant: Accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, 2023
work page 2023
-
[56]
Smoothquant: accurate and efficient post-training quantization for large language models
Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023
work page 2023
-
[57]
Zhenxiang Xiao, Yuzhong Chen, Junjie Yao, Lu Zhang, Zhengliang Liu, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Dezhong Yao, Tianming Liu, and Xi Jiang. Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024
work page 2024
-
[58]
Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024
Yanyue Xie, Zhi Zhang, Ding Zhou, Cong Xie, Ziang Song, Xin Liu, Yanzhi Wang, Xue Lin, and An Xu. Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024
work page 2024
-
[59]
Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Yuanlin Duan, Wenqi Jia, Miao Yin, Yu Cheng, and Bo Yuan. MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP...
work page 2024
-
[60]
PB-LLM: Partially binarized large language models
Zhihang Yuan, Yuzhang Shang, and Zhen Dong. PB-LLM: Partially binarized large language models. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[61]
Turboquant: Online vector quantization with near-optimal distortion rate, 2025
Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate, 2025
work page 2025
-
[62]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really finish your sentence? In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy, July 2019. Association for Computational Linguistics
work page 2019
-
[63]
Kunpeng Zhang, Feng Zhou, Lan Wu, Na Xie, and Zhengbing He. Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024
work page 2024
-
[64]
Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024
Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao. Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024. 21 MoEITS: A Green AI approach for simplifying MoE-LLMs
work page 2024
-
[65]
Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs
Yixiao Zhou, Ziyu Zhao, Dongzhou Cheng, Zhiliang Wu, Jie Gui, Yi Yang, Fei Wu, Yu Cheng, and Hehe Fan. Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNL...
work page 2025
-
[66]
Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, and Yuxuan Liang. Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025. 22
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.