Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous comput- ing

· 2023 · arXiv 2303.10845

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

SAMoRA is a parameter-efficient fine-tuning framework that uses semantic-aware routing and task-adaptive scaling within a Mixture of LoRA Experts to improve multi-task performance and generalization over prior methods.

Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision

cs.SE · 2026-04-08 · unverdicted · novelty 6.0

Local Platt scaling on three fine-grained confidence scores reduces calibration error for LLM-based automated code revision across tasks and models compared to global scaling alone.

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

cs.CL · 2024-01-11 · unverdicted · novelty 5.0

DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

A Comprehensive Overview of Large Language Models

cs.CL · 2023-07-12 · unverdicted · novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

citing papers explorer

Showing 5 of 5 citing papers.

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning cs.CL · 2026-04-21 · unverdicted · none · ref 28
SAMoRA is a parameter-efficient fine-tuning framework that uses semantic-aware routing and task-adaptive scaling within a Mixture of LoRA Experts to improve multi-task performance and generalization over prior methods.
Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision cs.SE · 2026-04-08 · unverdicted · none · ref 79
Local Platt scaling on three fine-grained confidence scores reduces calibration error for LLM-based automated code revision across tasks and models compared to global scaling alone.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 41
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.
A Survey of Large Language Models cs.CL · 2023-03-31 · accept · none · ref 121
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
A Comprehensive Overview of Large Language Models cs.CL · 2023-07-12 · unverdicted · none · ref 92
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous comput- ing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer