PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Bin Lu; Cheng Cui; Chenxia Li; Dianhai Yu; Jun Zhou; Qiwen Liu; Ruoyu Guo; Weiwei Liu; Xiaoguang Hu; Yanjun Ma

arxiv: 2109.03144 · v2 · pith:3POX4UYOnew · submitted 2021-09-07 · 💻 cs.CV

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Yuning Du , Chenxia Li , Ruoyu Guo , Cheng Cui , Weiwei Liu , Jun Zhou , Bin Lu , Yehua Yang

show 4 more authors

Qiwen Liu Xiaoguang Hu Dianhai Yu Yanjun Ma

This is my paper

classification 💻 cs.CV

keywords pp-ocrsystemlightweightpp-ocrv2accuracybetterefficiencylearning

0 comments

read the original abstract

Optical Character Recognition (OCR) systems have been widely used in various of application scenarios. Designing an OCR system is still a challenging task. In previous work, we proposed a practical ultra lightweight OCR system (PP-OCR) to balance the accuracy against the efficiency. In order to improve the accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more robust OCR system, i.e. PP-OCRv2. We introduce bag of tricks to train a better text detector and a better text recognizer, which include Collaborative Mutual Learning (CML), CopyPaste, Lightweight CPUNetwork (LCNet), Unified-Deep Mutual Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost. It is also comparable to the server models of the PP-OCR which uses ResNet series as backbones. All of the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleOCR which is powered by PaddlePaddle.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ViBR: Automated Bug Replay from Video-based Reports using Vision-Language Models
cs.SE 2026-04 unverdicted novelty 7.0

ViBR reproduces 72% of bugs from video reports by segmenting actions with CLIP similarity and using VLMs for region-aware GUI state comparison, outperforming prior heuristics-based methods.
StrucTab: A Structured Optimization Framework for Table Parsing
cs.CV 2026-06 unverdicted novelty 6.0

StrucTab achieves SOTA table parsing performance by unifying structural subtasks through sequential reasoning and using decomposed RL rewards in Uni-TabRL, plus a new TableVerse-5K benchmark.
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
cs.CV 2024-09 unverdicted novelty 5.0

GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.
PaddleOCR 3.0 Technical Report
cs.CV 2025-07 unverdicted novelty 4.0

PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.
PP-OCRv6: From 1.5M to 34.5M Parameters, Surpassing Billion-Scale VLMs on OCR Tasks
cs.CV 2026-06 unverdicted novelty 3.0

PP-OCRv6 introduces three tiers of lightweight OCR models (1.5M–34.5M parameters) built on unified MetaFormer blocks with reparameterization that claim superior accuracy to PP-OCRv5 and billion-scale VLMs on in-house ...