MoEITS: A Green AI approach for simplifying MoE-LLMs

Jos\'e M. Ben\'itez; Luis Balderas; Miguel Lastra

arxiv: 2604.10603 · v1 · submitted 2026-04-12 · 💻 cs.LG · cs.AI· cs.PF

MoEITS: A Green AI approach for simplifying MoE-LLMs

Luis Balderas , Miguel Lastra , Jos\'e M. Ben\'itez This is my paper

Pith reviewed 2026-05-10 16:05 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.PF

keywords MoELLM pruninginformation theorymodel compressionefficient AImixture of expertsgreen AI

0 comments

The pith

MoEITS prunes experts in Mixture-of-Experts LLMs using information theory to create smaller, equally accurate models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MoEITS as a new algorithm to simplify large language models that use mixture of experts. It applies standardized information theoretic measures to decide which experts can be removed, thereby lowering the computational demands and energy use. Through tests on models including Mixtral 8x7B, the method produces versions that perform well on benchmarks and use fewer resources than models simplified by other current techniques. Readers would care if they want powerful AI systems that run efficiently on available hardware. The work includes both theoretical analysis of the algorithm's complexity and practical comparisons.

Core claim

MoEITS identifies redundant experts in MoE-based large language models via information-theoretic criteria and removes them, resulting in simplified models that retain effectiveness on all tested benchmarks while achieving greater computational efficiency than existing pruning approaches.

What carries the argument

The MoEITS algorithm, which ranks experts by their information contribution using standardized measures and prunes those with the lowest scores.

If this is right

Simplified MoE-LLMs require less memory storage and faster inference times.
Energy consumption during model use decreases, aligning with green AI goals.
The pruning process itself has low computational complexity.
Results hold across different base models such as Qwen and DeepSeek variants.
Models remain effective without needing retraining after pruning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar information measures could guide pruning in non-MoE transformer architectures.
Integrating MoEITS into training loops might prevent over-provisioning of experts from the start.
Deployments on resource-limited devices become more feasible for high-performing models.
Future work could explore combining this with other compression methods like distillation.

Load-bearing premise

Standardized information-theoretic measures can accurately identify experts that are dispensable without harming the model's ability to handle new or unseen tasks.

What would settle it

Running the simplified models on a completely new benchmark or domain and observing a significant drop in performance compared to the original model.

Figures

Figures reproduced from arXiv: 2604.10603 by Jos\'e M. Ben\'itez, Luis Balderas, Miguel Lastra.

**Figure 2.** Figure 2: Evolution of the simplified Qwen1.5-2.7B model with MoEITS for different values of [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

Large language models are transforming all areas of academia and industry, attracting the attention of researchers, professionals, and the general public. In the trek for more powerful architectures, Mixture-of-Experts, inspired by ensemble models, have emerged as one of the most effective ways to follow. However, this implies a high computational burden for both training and inference. To reduce the impact on computing and memory footprint as well as the energy consumption, simplification methods has arisen as very effective procedures. In this paper, an original algorithm, MoEITS, for MoE-LLMs simplification is presented. The algorithm is characterized by a refined simplicity, underpinned by standardized Information Theoretic frameworks. MoEITS is analyzed in depth from theoretical and practical points of view. Its computational complexity is studied. Its performance on the accuracy of the simplified LLMs and the reduction rate achieved is assessed through a thoroughly designed experimentation. This empirical evaluation includes a comparison with state-of-the-art MoE-LLM pruning methods applied on Mixtral $8\times7$B, Qwen1.5-2.7B, and DeepSeek-V2-Lite. The extensive experimentation conducted demonstrates that MoEITS outperforms state-of-the-art techniques by generating models that are both effective across all benchmarks and computationally efficient. The code implementing the method will be available at https://github.com/luisbalru/MoEITS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MoEITS brings a new information theory approach to pruning MoE experts but its robustness needs checking beyond standard benchmarks.

read the letter

This paper's main contribution is MoEITS, an algorithm that simplifies Mixture-of-Experts LLMs by pruning experts based on standardized information-theoretic measures like entropy and mutual information. It is new in applying these IT frameworks directly to expert selection in a way that the abstract says differs from previous pruning techniques. The work does well by testing on actual large models including Mixtral 8x7B and reporting both maintained accuracy and efficiency gains over state-of-the-art methods. The promise to release code helps with checking the claims. The main soft spot is the lack of detail in the abstract on the precise pruning criteria and theoretical analysis. Experiments appear confined to standard benchmarks without tests for domain shifts or unseen tasks. This leaves open the possibility that the IT scores correlate more with in-distribution patterns than true expert utility, which could limit real-world applicability. The paper targets engineers and researchers focused on reducing the computational and energy costs of MoE architectures. It shows clear thinking in framing a practical method with comparisons, so it deserves a serious referee to examine the derivations and run additional validations. I recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces MoEITS, an algorithm for simplifying Mixture-of-Experts LLMs via standardized information-theoretic measures (entropy, mutual information, etc.) to prune experts. It provides a theoretical analysis of the method, studies its computational complexity, and reports empirical results claiming that the resulting models outperform prior state-of-the-art pruning techniques in both accuracy and efficiency on standard benchmarks. Experiments are conducted on Mixtral 8×7B, Qwen1.5-2.7B, and DeepSeek-V2-Lite, with code promised for release.

Significance. If the central claims hold, the work offers a principled, information-theoretic route to reducing the inference cost and energy footprint of MoE architectures without sacrificing benchmark performance. The explicit complexity analysis and promised open-source implementation are positive features that would aid reproducibility and adoption in green-AI research.

major comments (2)

Experimental evaluation (throughout the empirical sections): the reported benchmarks are confined to standard in-distribution test sets used by prior pruning methods. No out-of-distribution, domain-shift, or cross-task splits are presented, leaving open the possibility that the IT-based importance scores primarily capture in-distribution activation frequency rather than intrinsic expert utility. This directly affects the load-bearing claim that pruned models remain “effective across all benchmarks.”
Section describing the pruning criterion: the paper asserts that standardized IT measures reliably identify removable experts, yet provides no ablation showing how these scores behave when the input distribution is altered (e.g., via synthetic distribution shifts or held-out domains). Without such evidence, the superiority over baselines may be an artifact of the evaluation distribution.

minor comments (2)

The abstract and introduction repeatedly use “simplification” and “pruning” interchangeably; a brief clarifying sentence on the precise relationship would improve readability.
Figure captions and table headers should explicitly state the number of runs and any statistical significance tests performed, rather than reporting single-point metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The concerns about the scope of our experimental evaluation and the need for robustness checks under distribution shifts are well-taken. We address each major comment point by point below and will make partial revisions to incorporate clarifications and a limitations discussion in the revised version.

read point-by-point responses

Referee: Experimental evaluation (throughout the empirical sections): the reported benchmarks are confined to standard in-distribution test sets used by prior pruning methods. No out-of-distribution, domain-shift, or cross-task splits are presented, leaving open the possibility that the IT-based importance scores primarily capture in-distribution activation frequency rather than intrinsic expert utility. This directly affects the load-bearing claim that pruned models remain “effective across all benchmarks.”

Authors: We acknowledge that our experiments follow the standard in-distribution benchmarks used by prior MoE pruning methods to ensure fair and reproducible comparisons. The information-theoretic measures in MoEITS are computed from activation statistics and aim to quantify each expert's contribution to the output distribution, which we posit goes beyond mere frequency. Nevertheless, we agree that explicit OOD or domain-shift evaluations would provide stronger evidence for generalization. In the revised manuscript, we will add a dedicated paragraph in the Discussion section noting this limitation, clarifying that our claims are scoped to the evaluated benchmarks, and outlining future work on cross-domain testing. revision: partial
Referee: Section describing the pruning criterion: the paper asserts that standardized IT measures reliably identify removable experts, yet provides no ablation showing how these scores behave when the input distribution is altered (e.g., via synthetic distribution shifts or held-out domains). Without such evidence, the superiority over baselines may be an artifact of the evaluation distribution.

Authors: The pruning criterion relies on standardized entropy and mutual information computed over observed expert activations for the given inputs, providing a principled ranking independent of any single baseline. We do not claim the scores are invariant to arbitrary shifts, but the theoretical analysis in the paper supports their use for identifying low-utility experts. To address the concern, we will revise the method section to include a short explanation of how the measures are expected to behave under moderate distribution changes and explicitly flag comprehensive shift ablations as future work. revision: partial

Circularity Check

0 steps flagged

MoEITS derivation is self-contained with no load-bearing circular steps

full rationale

The paper introduces MoEITS as a new pruning algorithm derived from standard information-theoretic measures (entropy, mutual information) applied to MoE expert selection. No equations or steps reduce the claimed performance gains to fitted parameters renamed as predictions, self-definitional loops, or self-citation chains; the method is presented as original and evaluated via direct comparison to external SOTA baselines on Mixtral, Qwen, and DeepSeek models. The central claims rest on empirical benchmarking rather than tautological re-derivation of inputs, satisfying the criteria for an independent derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that information-theoretic quantities can serve as reliable proxies for expert utility in MoE architectures; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Information-theoretic measures can effectively quantify the contribution of individual experts in MoE-LLMs for pruning decisions
The method is characterized by reliance on standardized Information Theoretic frameworks to achieve simplification.

pith-pipeline@v0.9.0 · 5561 in / 1172 out tokens · 70063 ms · 2026-05-10T16:05:17.103332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages

[1]

https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5

teknium/OpenHermes-2.5 · Datasets at Hugging Face — huggingface.co. https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5. [Accessed 01-03-2026]. 16 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2026
[2]

Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman

Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. SliceGPT: Compress large language models by deleting rows and columns. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[3]

MoEITS healing phase: OpenHermes 2.5 training subset

Luis Balderas. MoEITS healing phase: OpenHermes 2.5 training subset. https://zenodo.org/records/19535551, 2026

work page arXiv 2026
[4]

Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025

Wuzhida Bao, Yuting Cao, Yin Yang, Hangjun Che, Junjian Huang, and Shiping Wen. Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025

work page 2025
[5]

A survey on mixture of experts, 2024

Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, and Jiayi Huang. A survey on mixture of experts, 2024

work page 2024
[6]

EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models

Yuanteng Chen, Yuantian Shao, Peisong Wang, and Jian Cheng. EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1...

work page 2025
[7]

Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025

Jian Cheng, Haidong Kang, Yuxin Shao, Nan Li, Pengjun Chen, Rui Wang, Saiqin Long, Xiaochun Yang, and Lianbo Ma. Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025

work page 2025
[8]

A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts

Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, and Christopher Carothers. A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings o...

work page 2024
[9]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Huma...

work page 2019
[10]

Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

work page 2018
[11]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information Theory (Wiley Series in Telecommuni- cations and Signal Processing). Wiley-Interscience, USA, 2006

work page 2006
[12]

Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y . Wu, Zhenda Xie, Y . K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models, 2024

work page 2024
[13]

How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011

Robert Dawson. How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011. 17 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2011
[14]

Llm.int8(): 8-bit matrix multi- plication for transformers at scale

Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Llm.int8(): 8-bit matrix multi- plication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc

work page 2022
[15]

Qlora: efficient finetuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: efficient finetuning of quantized llms. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023
[16]

Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023

Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023

work page 2023
[17]

Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper

Yi Ding and Tianyao Shi. Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper. In2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC), pages 37–38, 2024

work page 2024
[18]

Tyrone E. Duncan. On the calculation of mutual information.SIAM Journal on Applied Mathematics, 19(1):215–220, 1970

work page 1970
[19]

Cruz, and George D.C

Faramarz Farhangian, Rafael M.O. Cruz, and George D.C. Cavalcanti. Fake news detection: Taxonomy and comparative study.Information Fusion, 103:102140, 2024

work page 2024
[20]

Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J. Mach. Learn. Res., 23(1), January 2022

work page 2022
[21]

DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts

Yuchen Feng, Bowen Shen, Naibin Gu, Jiaxuan Zhao, Peng Fu, Zheng Lin, and Weiping Wang. DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Proceedings of the 63rd Annual Meeting of the Association for Computa...

work page 2025
[22]

The language model evaluation harness, 07 2024

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The languag...

work page 2024
[23]

Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025

Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zhen Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Lou Qian, Xu Jie, and Yen-Chang Hsu. Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025

work page 2025
[24]

Polarquant: Quantizing kv caches with polar transformation, 2025

Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh. Polarquant: Quantizing kv caches with polar transformation, 2025

work page 2025
[25]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

work page 2025
[26]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. 18 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2022
[27]

Wentao Hu, Mingkuan Zhao, Shuangyong Song, Xiaoyan Zhu, Xin Lai, and Jiayin Wang. Mosaic pruning: A hierarchical framework for generalizable pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelligence, 40(26):21885–21893, Mar. 2026

work page 2026
[28]

MC-moe: Mixture compressor for mixture-of-experts LLMs gains more

Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, and XIAOJUAN QI. MC-moe: Mixture compressor for mixture-of-experts LLMs gains more. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[29]

Billm: pushing the limit of post-training quantization for llms

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xiaojuan Qi. Billm: pushing the limit of post-training quantization for llms. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

work page 2024
[30]

How good are low-bit quantized llama3 models? an empirical study,

Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, and Michele Magno. How good are low-bit quantized llama3 models? an empirical study.CoRR, abs/2404.14047, 2024

work page arXiv 2024
[31]

Farhan Ishmam, Md

Md. Farhan Ishmam, Md. Sakib Hossain Shovon, M.F. Mridha, and Nilanjan Dey. From image to lan- guage: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities. Information Fusion, 106:102270, 2024

work page 2024
[32]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

work page 1991
[33]

Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven...

work page 2024
[34]

Kvålseth

Tarald O. Kvålseth. On normalized mutual information: Measure derivations and properties.Entropy, 19(11), 2017

work page 2017
[35]

STUN: Structured-then-unstructured pruning for scalable MoE pruning

Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, and Yuxiong He. STUN: Structured-then-unstructured pruning for scalable MoE pruning. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

work page 2025
[36]

Lin Li, Yan Wang, and Zhuopeng Wang. C-gnn-prune: A unified graph-based framework for structure- aware pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelli- gence, 40(27):22976–22984, Mar. 2026

work page 2026
[37]

ApiQ: Finetuning of 2-bit quantized large language model

Baohao Liao, Christian Herold, Shahram Khadivi, and Christof Monz. ApiQ: Finetuning of 2-bit quantized large language model. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20996–21020, Miami, Florida, USA, November 2024. Association for Computatio...

work page 2024
[38]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. In P. Gibbons, G. Pekhimenko, and C. De Sa, editors,Proceedings of Machine Learning and Systems, volume 6, pages 87–100, 2024. 19 MoEI...

work page 2024
[39]

Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024

Zhiwei Liu, Tianlin Zhang, Kailai Yang, Paul Thompson, Zeping Yu, and Sophia Ananiadou. Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024

work page 2024
[40]

Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models

Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, and Hongsheng Li. Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volu...

work page 2024
[41]

LLM-pruner: On the structural pruning of large language models

Xinyin Ma, Gongfan Fang, and Xinchao Wang. LLM-pruner: On the structural pruning of large language models. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[42]

Llm-pruner: On the structural pruning of large language models, 2023

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models, 2023

work page 2023
[43]

Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025

Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, and Bei Yu. Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025

work page 2025
[44]

Winogrande: an adversarial winograd schema challenge at scale.Commun

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106, August 2021

work page 2021
[45]

Smith, and Oren Etzioni

Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. Green ai.Commun. ACM, 63(12):54–63, November 2020

work page 2020
[46]

C. Shannon. The lattice theory of information.Transactions of the IRE Professional Group on Information Theory, 1(1):105–107, 1953

work page 1953
[47]

C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379– 423, 1948

work page 1948
[48]

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

Noam Shazeer, *Azalia Mirhoseini, *Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017

work page 2017
[49]

Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002

Alexander Strehl and Joydeep Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002

work page 2002
[50]

Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024

Qwen Team. Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024

work page 2024
[51]

Llama 2: Open foundation and fine-tuned chat models, 2023

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Can- ton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Har...

work page 2023
[52]

Carbon footprint evaluation of code generation through llm as a service

Tina Vartziotis, Maximilian Schmidt, George Dasoulas, Ippolyti Dellatolas, Stefano Attademo, Viet Dung Le, Anke Wiechmann, Tim Hoffmann, Michael Keckeisen, and Sotirios Kotsopoulos. Carbon footprint evaluation of code generation through llm as a service. In André Casal Kulzer, Hans- Christian Reuss, and Andreas Wagner, editors,2024 Stuttgart International...

work page 2024
[53]

A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023

Roberto Verdecchia, June Sallou, and Luís Cruz. A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023

work page 2023
[54]

H. P. Vinutha, B. Poornima, and B. M. Sagar. Detection of outliers using interquartile range technique from intrusion dataset. In Suresh Chandra Satapathy, Joao Manuel R.S. Tavares, Vikrant Bhateja, and J. R. Mohanty, editors,Information and Decision Sciences, pages 511–518, Singapore, 2018. Springer Singapore

work page 2018
[55]

SmoothQuant: Accurate and efficient post-training quantization for large language models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. SmoothQuant: Accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, 2023

work page 2023
[56]

Smoothquant: accurate and efficient post-training quantization for large language models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023
[57]

Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024

Zhenxiang Xiao, Yuzhong Chen, Junjie Yao, Lu Zhang, Zhengliang Liu, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Dezhong Yao, Tianming Liu, and Xi Jiang. Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024

work page 2024
[58]

Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024

Yanyue Xie, Zhi Zhang, Ding Zhou, Cong Xie, Ziang Song, Xin Liu, Yanzhi Wang, Xue Lin, and An Xu. Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024

work page 2024
[59]

MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition

Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Yuanlin Duan, Wenqi Jia, Miao Yin, Yu Cheng, and Bo Yuan. MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP...

work page 2024
[60]

PB-LLM: Partially binarized large language models

Zhihang Yuan, Yuzhang Shang, and Zhen Dong. PB-LLM: Partially binarized large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[61]

Turboquant: Online vector quantization with near-optimal distortion rate, 2025

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate, 2025

work page 2025
[62]

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really finish your sentence? In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy, July 2019. Association for Computational Linguistics

work page 2019
[63]

Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024

Kunpeng Zhang, Feng Zhou, Lan Wu, Na Xie, and Zhengbing He. Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024

work page 2024
[64]

Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024

Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao. Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024. 21 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2024
[65]

Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs

Yixiao Zhou, Ziyu Zhao, Dongzhou Cheng, Zhiliang Wu, Jie Gui, Yi Yang, Fei Wu, Yu Cheng, and Hehe Fan. Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNL...

work page 2025
[66]

Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025

Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, and Yuxuan Liang. Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025. 22

work page 2025

[1] [1]

https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5

teknium/OpenHermes-2.5 · Datasets at Hugging Face — huggingface.co. https://huggingface.co/ datasets/teknium/{O}pen{H}ermes-2.5. [Accessed 01-03-2026]. 16 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2026

[2] [2]

Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman

Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. SliceGPT: Compress large language models by deleting rows and columns. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[3] [3]

MoEITS healing phase: OpenHermes 2.5 training subset

Luis Balderas. MoEITS healing phase: OpenHermes 2.5 training subset. https://zenodo.org/records/19535551, 2026

work page arXiv 2026

[4] [4]

Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025

Wuzhida Bao, Yuting Cao, Yin Yang, Hangjun Che, Junjian Huang, and Shiping Wen. Data-driven stock forecasting models based on neural networks: A review.Information Fusion, 113:102616, 2025

work page 2025

[5] [5]

A survey on mixture of experts, 2024

Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, and Jiayi Huang. A survey on mixture of experts, 2024

work page 2024

[6] [6]

EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models

Yuanteng Chen, Yuantian Shao, Peisong Wang, and Jian Cheng. EAC-MoE: Expert-selection aware compressor for mixture-of-experts large language models. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1...

work page 2025

[7] [7]

Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025

Jian Cheng, Haidong Kang, Yuxin Shao, Nan Li, Pengjun Chen, Rui Wang, Saiqin Long, Xiaochun Yang, and Lianbo Ma. Survey on efficient large language models: Principles, algorithms, applications, and open issues.IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025 NOV 14 2025

work page 2025

[8] [8]

A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts

Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, and Christopher Carothers. A provably effective method for pruning experts in fine-tuned sparse mixture-of-experts. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings o...

work page 2024

[9] [9]

BoolQ: Exploring the surprising difficulty of natural yes/no questions

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Huma...

work page 2019

[10] [10]

Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018

work page 2018

[11] [11]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information Theory (Wiley Series in Telecommuni- cations and Signal Processing). Wiley-Interscience, USA, 2006

work page 2006

[12] [12]

Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y . Wu, Zhenda Xie, Y . K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models, 2024

work page 2024

[13] [13]

How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011

Robert Dawson. How significant is a boxplot outlier?Journal of Statistics Education, 19(2), 2011. 17 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2011

[14] [14]

Llm.int8(): 8-bit matrix multi- plication for transformers at scale

Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Llm.int8(): 8-bit matrix multi- plication for transformers at scale. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc

work page 2022

[15] [15]

Qlora: efficient finetuning of quantized llms

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: efficient finetuning of quantized llms. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023

[16] [16]

Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023

Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2023

work page 2023

[17] [17]

Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper

Yi Ding and Tianyao Shi. Sustainable llm serving: Environmental implications, challenges, and opportunities : Invited paper. In2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC), pages 37–38, 2024

work page 2024

[18] [18]

Tyrone E. Duncan. On the calculation of mutual information.SIAM Journal on Applied Mathematics, 19(1):215–220, 1970

work page 1970

[19] [19]

Cruz, and George D.C

Faramarz Farhangian, Rafael M.O. Cruz, and George D.C. Cavalcanti. Fake news detection: Taxonomy and comparative study.Information Fusion, 103:102140, 2024

work page 2024

[20] [20]

Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity.J. Mach. Learn. Res., 23(1), January 2022

work page 2022

[21] [21]

DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts

Yuchen Feng, Bowen Shen, Naibin Gu, Jiaxuan Zhao, Peng Fu, Zheng Lin, and Weiping Wang. DIVE into MoE: Diversity-enhanced reconstruction of large language models from dense into mixture-of- experts. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Proceedings of the 63rd Annual Meeting of the Association for Computa...

work page 2025

[22] [22]

The language model evaluation harness, 07 2024

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The languag...

work page 2024

[23] [23]

Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025

Shangqian Gao, Ting Hua, Reza Shirkavand, Chi-Heng Lin, Zhen Tang, Zhengao Li, Longge Yuan, Fangyi Li, Zeyu Zhang, Alireza Ganjdanesh, Lou Qian, Xu Jie, and Yen-Chang Hsu. Tomoe: Converting dense large language models to mixture-of-experts through dynamic structural pruning, 2025

work page 2025

[24] [24]

Polarquant: Quantizing kv caches with polar transformation, 2025

Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh. Polarquant: Quantizing kv caches with polar transformation, 2025

work page 2025

[25] [25]

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics.Information Fusion, 118:102963, 2025

work page 2025

[26] [26]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. 18 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2022

[27] [27]

Wentao Hu, Mingkuan Zhao, Shuangyong Song, Xiaoyan Zhu, Xin Lai, and Jiayin Wang. Mosaic pruning: A hierarchical framework for generalizable pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelligence, 40(26):21885–21893, Mar. 2026

work page 2026

[28] [28]

MC-moe: Mixture compressor for mixture-of-experts LLMs gains more

Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, and XIAOJUAN QI. MC-moe: Mixture compressor for mixture-of-experts LLMs gains more. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[29] [29]

Billm: pushing the limit of post-training quantization for llms

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, and Xiaojuan Qi. Billm: pushing the limit of post-training quantization for llms. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

work page 2024

[30] [30]

How good are low-bit quantized llama3 models? an empirical study,

Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, and Michele Magno. How good are low-bit quantized llama3 models? an empirical study.CoRR, abs/2404.14047, 2024

work page arXiv 2024

[31] [31]

Farhan Ishmam, Md

Md. Farhan Ishmam, Md. Sakib Hossain Shovon, M.F. Mridha, and Nilanjan Dey. From image to lan- guage: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities. Information Fusion, 106:102270, 2024

work page 2024

[32] [32]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

work page 1991

[33] [33]

Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven...

work page 2024

[34] [34]

Kvålseth

Tarald O. Kvålseth. On normalized mutual information: Measure derivations and properties.Entropy, 19(11), 2017

work page 2017

[35] [35]

STUN: Structured-then-unstructured pruning for scalable MoE pruning

Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, and Yuxiong He. STUN: Structured-then-unstructured pruning for scalable MoE pruning. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

work page 2025

[36] [36]

Lin Li, Yan Wang, and Zhuopeng Wang. C-gnn-prune: A unified graph-based framework for structure- aware pruning of mixture-of-experts models.Proceedings of the AAAI Conference on Artificial Intelli- gence, 40(27):22976–22984, Mar. 2026

work page 2026

[37] [37]

ApiQ: Finetuning of 2-bit quantized large language model

Baohao Liao, Christian Herold, Shahram Khadivi, and Christof Monz. ApiQ: Finetuning of 2-bit quantized large language model. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20996–21020, Miami, Florida, USA, November 2024. Association for Computatio...

work page 2024

[38] [38]

Awq: Activation-aware weight quantization for on-device llm compression and acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. In P. Gibbons, G. Pekhimenko, and C. De Sa, editors,Proceedings of Machine Learning and Systems, volume 6, pages 87–100, 2024. 19 MoEI...

work page 2024

[39] [39]

Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024

Zhiwei Liu, Tianlin Zhang, Kailai Yang, Paul Thompson, Zeping Yu, and Sophia Ananiadou. Emotion detection for misinformation: A review.Information Fusion, 107:102300, 2024

work page 2024

[40] [40]

Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models

Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, and Hongsheng Li. Not all experts are equal: Efficient expert pruning and skipping for mixture-of-experts large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volu...

work page 2024

[41] [41]

LLM-pruner: On the structural pruning of large language models

Xinyin Ma, Gongfan Fang, and Xinchao Wang. LLM-pruner: On the structural pruning of large language models. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[42] [42]

Llm-pruner: On the structural pruning of large language models, 2023

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Llm-pruner: On the structural pruning of large language models, 2023

work page 2023

[43] [43]

Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025

Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, and Bei Yu. Cmoe: Fast carving of mixture-of-experts for efficient llm inference, 2025

work page 2025

[44] [44]

Winogrande: an adversarial winograd schema challenge at scale.Commun

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: an adversarial winograd schema challenge at scale.Commun. ACM, 64(9):99–106, August 2021

work page 2021

[45] [45]

Smith, and Oren Etzioni

Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. Green ai.Commun. ACM, 63(12):54–63, November 2020

work page 2020

[46] [46]

C. Shannon. The lattice theory of information.Transactions of the IRE Professional Group on Information Theory, 1(1):105–107, 1953

work page 1953

[47] [47]

C. E. Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379– 423, 1948

work page 1948

[48] [48]

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

Noam Shazeer, *Azalia Mirhoseini, *Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017

work page 2017

[49] [49]

Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002

Alexander Strehl and Joydeep Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions.Journal of Machine Learning Research, 3:583–617, 01 2002

work page 2002

[50] [50]

Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024

Qwen Team. Qwen1.5-moe: Matching 7b model performance with 1/3 activated parameters", February 2024

work page 2024

[51] [51]

Llama 2: Open foundation and fine-tuned chat models, 2023

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Can- ton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Har...

work page 2023

[52] [52]

Carbon footprint evaluation of code generation through llm as a service

Tina Vartziotis, Maximilian Schmidt, George Dasoulas, Ippolyti Dellatolas, Stefano Attademo, Viet Dung Le, Anke Wiechmann, Tim Hoffmann, Michael Keckeisen, and Sotirios Kotsopoulos. Carbon footprint evaluation of code generation through llm as a service. In André Casal Kulzer, Hans- Christian Reuss, and Andreas Wagner, editors,2024 Stuttgart International...

work page 2024

[53] [53]

A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023

Roberto Verdecchia, June Sallou, and Luís Cruz. A systematic review of green ai.WIREs Data Mining and Knowledge Discovery, 13(4):e1507, 2023

work page 2023

[54] [54]

H. P. Vinutha, B. Poornima, and B. M. Sagar. Detection of outliers using interquartile range technique from intrusion dataset. In Suresh Chandra Satapathy, Joao Manuel R.S. Tavares, Vikrant Bhateja, and J. R. Mohanty, editors,Information and Decision Sciences, pages 511–518, Singapore, 2018. Springer Singapore

work page 2018

[55] [55]

SmoothQuant: Accurate and efficient post-training quantization for large language models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. SmoothQuant: Accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, 2023

work page 2023

[56] [56]

Smoothquant: accurate and efficient post-training quantization for large language models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: accurate and efficient post-training quantization for large language models. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023

[57] [57]

Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024

Zhenxiang Xiao, Yuzhong Chen, Junjie Yao, Lu Zhang, Zhengliang Liu, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Dezhong Yao, Tianming Liu, and Xi Jiang. Instruction-vit: Multi-modal prompts for instruction learning in vision transformer.Information Fusion, 104:102204, 2024

work page 2024

[58] [58]

Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024

Yanyue Xie, Zhi Zhang, Ding Zhou, Cong Xie, Ziang Song, Xin Liu, Yanzhi Wang, Xue Lin, and An Xu. Moe-pruner: Pruning mixture-of-experts large language model using the hints from its router, 2024

work page 2024

[59] [59]

MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition

Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Yuanlin Duan, Wenqi Jia, Miao Yin, Yu Cheng, and Bo Yuan. MoE-i2: Compressing mixture of experts models through inter-expert pruning and intra-expert low-rank decomposition. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP...

work page 2024

[60] [60]

PB-LLM: Partially binarized large language models

Zhihang Yuan, Yuzhang Shang, and Zhen Dong. PB-LLM: Partially binarized large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[61] [61]

Turboquant: Online vector quantization with near-optimal distortion rate, 2025

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate, 2025

work page 2025

[62] [62]

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really finish your sentence? In Anna Korhonen, David Traum, and Lluís Màrquez, editors,Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy, July 2019. Association for Computational Linguistics

work page 2019

[63] [63]

Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024

Kunpeng Zhang, Feng Zhou, Lan Wu, Na Xie, and Zhengbing He. Semantic understanding and prompt engineering for large-scale traffic data imputation.Information Fusion, 102:102038, 2024

work page 2024

[64] [64]

Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024

Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao. Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts, 2024. 21 MoEITS: A Green AI approach for simplifying MoE-LLMs

work page 2024

[65] [65]

Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs

Yixiao Zhou, Ziyu Zhao, Dongzhou Cheng, Zhiliang Wu, Jie Gui, Yi Yang, Fei Wu, Yu Cheng, and Hehe Fan. Dropping experts, recombining neurons: Retraining-free pruning for sparse mixture-of-experts LLMs. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNL...

work page 2025

[66] [66]

Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025

Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, and Yuxuan Liang. Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook.Information Fusion, 113:102606, 2025. 22

work page 2025