Pp-ocrv3: More attempts for the im- provement of ultra lightweight ocr system

· 2022 · arXiv 2206.03001

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities

cs.CV · 2026-04-27 · accept · novelty 7.0

BMD-45 is a new large-scale CCTV vehicle detection dataset from developing cities that reveals a 2.5x performance gap for models adapted from prior benchmarks.

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.

StyleTextGen: Style-Conditioned Multilingual Scene Text Generation

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.

CogVLM2: Visual Language Models for Image and Video Understanding

cs.CV · 2024-08-29 · conditional · novelty 5.0

CogVLM2 family achieves state-of-the-art results on image and video understanding benchmarks through improved visual expert architecture, higher resolution inputs, and automated temporal grounding for videos.

A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation

cs.CL · 2026-03-18 · conditional · novelty 4.0

A proactive EMR assistant using streaming ASR and belief stabilization reaches 0.84 state-event F1, 0.87 retrieval Recall@5, and 83.3% coverage in a controlled pilot of ten doctor-patient dialogues.

PaddleOCR 3.0 Technical Report

cs.CV · 2025-07-08 · unverdicted · novelty 4.0

PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

cs.CV · 2024-04-25 · unverdicted · novelty 4.0

InternVL 1.5 narrows the performance gap to proprietary multimodal models via a stronger transferable vision encoder, dynamic high-resolution tiling, and curated English-Chinese training data.

citing papers explorer

Showing 7 of 7 citing papers.

BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities cs.CV · 2026-04-27 · accept · none · ref 24
BMD-45 is a new large-scale CCTV vehicle detection dataset from developing cities that reveals a 2.5x performance gap for models adapted from prior benchmarks.
MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation cs.CL · 2026-04-18 · unverdicted · none · ref 18
MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other methods on image translation benchmarks.
StyleTextGen: Style-Conditioned Multilingual Scene Text Generation cs.CV · 2026-05-14 · unverdicted · none · ref 22
StyleTextGen proposes a dual-branch style encoder, text style consistency loss, and mask-guided inference to achieve superior style consistency and cross-lingual performance in multilingual scene text generation on a new bilingual benchmark.
CogVLM2: Visual Language Models for Image and Video Understanding cs.CV · 2024-08-29 · conditional · none · ref 32
CogVLM2 family achieves state-of-the-art results on image and video understanding benchmarks through improved visual expert architecture, higher resolution inputs, and automated temporal grounding for videos.
A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation cs.CL · 2026-03-18 · conditional · none · ref 12
A proactive EMR assistant using streaming ASR and belief stabilization reaches 0.84 state-event F1, 0.87 retrieval Recall@5, and 83.3% coverage in a controlled pilot of ten doctor-patient dialogues.
PaddleOCR 3.0 Technical Report cs.CV · 2025-07-08 · unverdicted · none · ref 51
PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024-04-25 · unverdicted · none · ref 49
InternVL 1.5 narrows the performance gap to proprietary multimodal models via a stronger transferable vision encoder, dynamic high-resolution tiling, and curated English-Chinese training data.

Pp-ocrv3: More attempts for the im- provement of ultra lightweight ocr system

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer