Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

arxiv: 2502.18036 · v6 · submitted 2025-02-25 · 💻 cs.CL

Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

Zhijun Chen , Xiaodong Lu , Jingzheng Li , Pengpeng Chen , Zhuoran Li , Kai Sun , Yuankai Luo , Qianren Mao

show 7 more authors

Ming Li Likang Xiao Dingqi Yang Xiao Huang Yikun Ban Hailong Sun Philip S. Yu

This is my paper

classification 💻 cs.CL

keywords ensemblefirstintroducelanguagelargellmsmethodsmodels

0 comments p. Extension

read the original abstract

LLM Ensemble -- which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths -- has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of "ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review all relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions. A curated list of papers on LLM Ensemble is available at https://github.com/junchenzhi/Awesome-LLM-Ensemble.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sampling from Your Language Model One Byte at a Time
cs.CL 2025-06 unverdicted novelty 7.0

An inference-time technique turns BPE-based LMs into byte- or character-level models, solving the prompt boundary problem while unifying vocabularies across different tokenizers.
A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability
cs.LG 2026-05 unverdicted novelty 6.0

LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.
Rethinking LLM Ensembling from the Perspective of Mixture Models
cs.LG 2026-05 unverdicted novelty 6.0

ME reinterprets LLM ensembling as a mixture model by sampling a single model stochastically at each token step, matching the ensemble distribution while invoking only one model per step for substantial speed gains.
SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission
eess.SP 2026-04 unverdicted novelty 5.0

SpecFed accelerates federated LLM inference via speculative decoding for parallel processing and top-K compression with server-side reconstruction, achieving high fidelity with reduced communication overhead.
Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process
cs.CL 2025-12 unverdicted novelty 5.0

LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.