ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Chao Pang; Dianhai Yu; HaiFeng Wang; Hao Tian; Hua Wu; Jianzhong Liang; Jiaxiang Liu; Junyuan Shang; Peng Sun; Shikun Feng

arxiv: 2107.02137 · v1 · pith:DFGC2T2Mnew · submitted 2021-07-05 · 💻 cs.CL

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Yu Sun , Shuohuan Wang , Shikun Feng , Siyu Ding , Chao Pang , Junyuan Shang , Jiaxiang Liu , Xuyi Chen

show 14 more authors

Yanbin Zhao Yuxiang Lu Weixin Liu Zhihua Wu Weibao Gong Jianzhong Liang Zhizhou Shang Peng Sun Wei Liu Xuan Ouyang Dianhai Yu Hao Tian Hua Wu Haifeng Wang

This is my paper

classification 💻 cs.CL

keywords knowledgemodelslanguagelarge-scalemodeltaskstrainedlearning

0 comments

read the original abstract

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OPT: Open Pre-trained Transformer Language Models
cs.CL 2022-05 unverdicted novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
MiniMax-01: Scaling Foundation Models with Lightning Attention
cs.CL 2025-01 unverdicted novelty 6.0

MiniMax-01 models match GPT-4o and Claude-3.5-Sonnet performance while providing 20-32 times longer context windows through lightning attention and MoE scaling.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
cs.CL 2022-04 accept novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
ST-MoE: Designing Stable and Transferable Sparse Expert Models
cs.CL 2022-02 unverdicted novelty 6.0

ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost ...
Ethical and social risks of harm from Language Models
cs.CL 2021-12 accept novelty 6.0

The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job...
MotionMAR: Multi-scale Auto-Regressive Human Motion Reconstruction from Sparse Observations
cs.CV 2026-06 unverdicted novelty 5.0

A coarse-to-fine autoregressive framework with multi-scale tokenization and scale-aware control reconstructs human motion from sparse observations and reports SOTA accuracy on AMASS.
TIDE: Every Layer Knows the Token Beneath the Context
cs.CL 2026-05 unverdicted novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.
Gyan: An Explainable Neuro-Symbolic Language Model
cs.CL 2026-05 unverdicted novelty 5.0

Gyan is a novel explainable neuro-symbolic language model that decouples language modeling from knowledge representation using rhetorical and semantic theories and reports superior performance on multiple datasets.
DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition
cs.AI 2026-05 unverdicted novelty 4.0

DuIVRS-2 deploys an LLM-driven IVR pipeline that processes 0.4 million calls per day at 83.9 percent task success rate using FSM-guided augmentation, selective CoT generation, and cooperative policy iteration.
Gyan: An Explainable Neuro-Symbolic Language Model
cs.CL 2026-05 unverdicted novelty 4.0

Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.
PaddleOCR 3.0 Technical Report
cs.CV 2025-07 unverdicted novelty 4.0

PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
A Survey of Large Language Models
cs.CL 2023-03 accept novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
A Comprehensive Overview of Large Language Models
cs.CL 2023-07 unverdicted novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.