pith. sign in

arxiv: 2107.02137 · v1 · pith:DFGC2T2Mnew · submitted 2021-07-05 · 💻 cs.CL

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

classification 💻 cs.CL
keywords knowledgemodelslanguagelarge-scalemodeltaskstrainedlearning
0
0 comments X
read the original abstract

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. OPT: Open Pre-trained Transformer Language Models

    cs.CL 2022-05 unverdicted novelty 7.0

    OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

  2. MiniMax-01: Scaling Foundation Models with Lightning Attention

    cs.CL 2025-01 unverdicted novelty 6.0

    MiniMax-01 models match GPT-4o and Claude-3.5-Sonnet performance while providing 20-32 times longer context windows through lightning attention and MoE scaling.

  3. GPT-NeoX-20B: An Open-Source Autoregressive Language Model

    cs.CL 2022-04 accept novelty 6.0

    GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

  4. ST-MoE: Designing Stable and Transferable Sparse Expert Models

    cs.CL 2022-02 unverdicted novelty 6.0

    ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost ...

  5. Ethical and social risks of harm from Language Models

    cs.CL 2021-12 accept novelty 6.0

    The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job...

  6. MotionMAR: Multi-scale Auto-Regressive Human Motion Reconstruction from Sparse Observations

    cs.CV 2026-06 unverdicted novelty 5.0

    A coarse-to-fine autoregressive framework with multi-scale tokenization and scale-aware control reconstructs human motion from sparse observations and reports SOTA accuracy on AMASS.

  7. TIDE: Every Layer Knows the Token Beneath the Context

    cs.CL 2026-05 unverdicted novelty 5.0

    TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

  8. Gyan: An Explainable Neuro-Symbolic Language Model

    cs.CL 2026-05 unverdicted novelty 5.0

    Gyan is a novel explainable neuro-symbolic language model that decouples language modeling from knowledge representation using rhetorical and semantic theories and reports superior performance on multiple datasets.

  9. DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition

    cs.AI 2026-05 unverdicted novelty 4.0

    DuIVRS-2 deploys an LLM-driven IVR pipeline that processes 0.4 million calls per day at 83.9 percent task success rate using FSM-guided augmentation, selective CoT generation, and cooperative policy iteration.

  10. Gyan: An Explainable Neuro-Symbolic Language Model

    cs.CL 2026-05 unverdicted novelty 4.0

    Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.

  11. PaddleOCR 3.0 Technical Report

    cs.CV 2025-07 unverdicted novelty 4.0

    PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.

  12. Large Language Models: A Survey

    cs.CL 2024-02 accept novelty 3.0

    The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

  13. A Survey of Large Language Models

    cs.CL 2023-03 accept novelty 3.0

    This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

  14. A Comprehensive Overview of Large Language Models

    cs.CL 2023-07 unverdicted novelty 2.0

    A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.