arxiv: 2604.14314 · v1 · submitted 2026-04-15 · 💻 cs.CV · cs.AI· cs.CL

Recognition: unknown

DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines

Gabriel Pimenta de Freitas Cardoso , Caio Lucas da Silva Chacon , Jonas Felipe da Fonseca Oliveira , Paulo Henrique de Medeiros Araujo

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CL

keywords structured OCRsmall language modelsDirect Preference Optimizationtext degenerationJSON schema extractionOCR benchmarkmodel quantizationfine-tuning for OCR

0 comments

The pith

Specialized 7B and 3B language models reach state-of-the-art structured OCR by using schema fine-tuning plus preference optimization to cut degeneration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops two small specialized language models for structured OCR tasks that jointly improve transcription accuracy, output stability, and inference cost. It applies supervised fine-tuning to enforce a strict JSON schema for document parts and direct preference optimization to penalize degenerate generations such as loops. On a new benchmark spanning printed, handwritten, and legal documents, these models exceed the quality of tested open-source and commercial OCR systems while holding degeneration below 0.5 percent. The work also shows that degeneration raises real production costs through longer runtimes and higher compute use, and that quantization preserves quality at lower cost. A reader would care because reliable extraction from structured documents supports automation in legal, administrative, and archival settings where errors or wasted computation carry direct expenses.

Core claim

The authors show that the first application of Direct Preference Optimization to OCR, treating degenerate generations as rejected examples, combined with Supervised Fine-Tuning to enforce a strict JSON schema for header, margin, footer, and text fields, produces DharmaOCR Full (7B) and DharmaOCR Lite (3B) models. These reach extraction quality scores of 0.925 and 0.911 on the DharmaOCR-Benchmark with degeneration rates of 0.40 percent and 0.20 percent, outperforming every open-source and commercial baseline evaluated, while AWQ quantization further reduces per-page cost by up to 22 percent with negligible quality loss.

What carries the argument

Direct Preference Optimization (DPO) that uses degenerate OCR outputs as rejected preferences to discourage looping behavior, paired with Supervised Fine-Tuning (SFT) that enforces a fixed JSON schema for document structure.

If this is right

Lower degeneration rates directly reduce average response time and raise throughput in production OCR pipelines.
Quantized versions of the models deliver up to 22 percent lower per-page inference cost while preserving extraction quality.
Tracking degeneration as a first-class metric alongside quality reveals hidden computational costs that standard OCR evaluations miss.
The same SFT-plus-DPO recipe works across model scales, delivering gains for both the 7B and 3B variants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same preference optimization against repetition could improve stability in other narrow-domain generation tasks such as report or form filling.
A public release of the benchmark would allow direct comparison of future OCR models on the same structured-document distribution.
These compact models suggest that domain-specific fine-tuning can close much of the gap to larger general-purpose systems for well-defined extraction problems.
Extending the approach to additional languages or document layouts would test how far the observed quality-cost gains generalize.

Load-bearing premise

The authors' benchmark documents and unified evaluation protocol represent real-world structured OCR tasks without selection bias that favors their particular schema and training setup.

What would settle it

Test the same models on an independent collection of structured printed, handwritten, and legal documents drawn from sources outside the benchmark and training data, then measure whether the reported quality scores and degeneration rates remain unchanged.

Figures

Figures reproduced from arXiv: 2604.14314 by Caio Lucas da Silva Chacon, Gabriel Pimenta de Freitas Cardoso, Jonas Felipe da Fonseca Oliveira, Paulo Henrique de Medeiros Araujo.

**Figure 1.** Figure 1: Synthesis of the proposed approach, key contributions, and results, illustrating the progression from vanilla [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Pictorial example of token- and sequence-level text degeneration, in which a single token (or token sequence) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Text degeneration rate (%) across alignment stages. SFT reduces degeneration relative to [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Quality–cost comparison among DharmaOCR models developed in this research, other open-source OCR [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: LLM-as-a-judge results from a comparison between DharmaOCR Full and Google Document AI. Bars show [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: LLM-as-a-judge results from a comparison between DharmaOCR Full and olmOCR-2-7B. Bars show the [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: LLM-as-a-judge results from a comparison between DharmaOCR Full and DharmaOCR Lite. Bars show the [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Progressive specialization strategy and comparison of two training paths. Three specialization levels are [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Start and end time of each request (in submission order) for dataset 1. Each request is represented by a bar [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Start and end time of each request (in submission order) for dataset 2. Each request is represented by a bar [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Start and end time of each request (in submission order) for dataset 3. Each request is represented by a bar [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Distribution of healthy-request durations for the three datasets, contrasting periods with at least one [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Example of document used to illustrate the structured output format. [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗

**Figure 14.** Figure 14: Example document used to illustrate structured output format for handwritten document. [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

read the original abstract

This manuscript introduces DharmaOCR Full and Lite, a pair of specialized small language models (SSLMs) for structured OCR that jointly optimize transcription quality, generation stability, and inference cost. It also presents DharmaOCR-Benchmark, a benchmark that covers printed, handwritten, and legal/administrative documents, and proposes a unified evaluation protocol that measures fidelity and structure while explicitly tracking text degeneration as a first-class benchmark metric (alongside unit cost). Beyond reporting degeneration rates, the manuscript empirically shows degeneration is not merely a quality failure, since it materially worsens production performance by increasing response time, reducing throughput, and inflating computational cost due to abnormally long generations. To the best of the author's knowledge, as a methodological contribution, this is the first application of Direct Preference Optimization (DPO) for OCR, explicitly using degenerate generations as rejected examples to penalize looping behavior. Combined with Supervised Fine-Tuning (SFT) for enforcing a strict JSON schema (header, margin, footer, and text), DPO consistently reduces degeneration rate across model families (up to 87.6% relative) while preserving or improving extraction quality. The resulting models, namely, DharmaOCR Full (7B) and DharmaOCR Lite (3B), set a new state-of-the-art on DharmaOCR-Benchmark, outperforming each open-source and commercial baseline model evaluated regarding extraction quality, reaching 0.925 and 0.911 scores with 0.40% and 0.20% degeneration rates. AWQ quantization reduced up to 22% per-page cost with negligible quality loss, enabling a strong quality-cost trade-off in comparison to proprietary OCR APIs and open-source alternatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DPO on degenerate OCR outputs gives stable small models, but benchmark bias is the open question.

read the letter

This paper's main contribution is showing that Direct Preference Optimization can be applied to OCR by treating degenerate generations as the bad examples, which cuts down on unstable outputs while keeping or improving quality on structured extraction tasks. They fine-tune small models, 3B and 7B, first with supervised fine-tuning to enforce a strict JSON output format covering header, margin, footer, and text. Then DPO reduces degeneration rates by up to 87.6% relative across models. On their new DharmaOCR-Benchmark covering printed, handwritten, and legal documents, the resulting models hit 0.925 and 0.911 extraction scores with very low degeneration at 0.40% and 0.20%. They also note that degeneration increases response time and cost, and quantization helps with per-page costs. The approach is practical for anyone needing reliable structured OCR without relying on big proprietary APIs. The idea of making degeneration a tracked metric alongside quality and cost is sensible, and the empirical results on cost-quality tradeoffs look useful. Where it gets thin is the evaluation setup. The benchmark and protocol are new, so there's a real chance the documents and schema were selected in a way that plays to their strengths. It's not clear from the description whether the open-source and commercial baselines were evaluated with the exact same output constraints, prompting, and degeneration tracking, or if some post-processing was allowed for them. That could inflate the apparent gains. Training details like exact hyperparameters or data sources aren't laid out in the abstract, which makes it harder to reproduce or judge robustness. No statistical tests are mentioned either. For readers working on document AI or practical OCR deployments, especially in cost-sensitive settings, this has value as a case study in stabilizing small models. It deserves peer review because the core idea is straightforward and the reported improvements are concrete, but the referees will need to press on the benchmark fairness and baseline equivalence to make the claims stick.

Referee Report

2 major / 1 minor

Summary. This manuscript claims to introduce DharmaOCR Full (7B) and DharmaOCR Lite (3B) as specialized small language models for structured OCR. These models are optimized using Supervised Fine-Tuning (SFT) with a strict JSON schema for elements like header, margin, footer, and text, combined with Direct Preference Optimization (DPO) applied to degenerate generations to reduce looping behavior. The paper also introduces the DharmaOCR-Benchmark covering printed, handwritten, and legal/administrative documents, and a unified evaluation protocol that assesses extraction quality, structure, and degeneration rates as a key metric. It reports that the models achieve state-of-the-art performance with extraction scores of 0.925 and 0.911, degeneration rates of 0.40% and 0.20%, outperforming open-source and commercial baselines. The work further shows that degeneration increases response time and cost, and that AWQ quantization can reduce per-page costs by up to 22% with negligible quality loss.

Significance. Assuming the results are based on a fair and reproducible evaluation, the significance lies in providing evidence that small LLMs can be effectively specialized for high-quality structured OCR with low degeneration using DPO, which is claimed to be the first such application. The new benchmark and protocol, along with the analysis of degeneration's impact on production metrics, could advance the field by highlighting stability as a critical factor alongside accuracy. The cost-quality trade-offs demonstrated could inform practical deployments in document processing pipelines. Credit is due for the empirical demonstration of DPO's benefits in this domain and the introduction of degeneration tracking.

major comments (2)

Abstract: The central SOTA claim (0.925/0.911 extraction scores, 0.40%/0.20% degeneration) and the reported relative degeneration reductions (up to 87.6%) rest entirely on the fairness of the unified evaluation protocol and DharmaOCR-Benchmark. The manuscript provides no details on how baselines were prompted, whether identical JSON schema enforcement and degeneration tracking were applied without post-processing, or how benchmark documents were selected and annotated. This is load-bearing because any misalignment in protocol could explain the gains rather than model superiority, as flagged in the stress-test concern on selection bias.
Abstract: No information is given on training datasets, SFT/DPO hyperparameters (e.g., beta, learning rate, JSON enforcement strength), number of training examples, or statistical tests for the reported improvements. Without these, the reproducibility of the quality-cost trade-offs and the claim that DPO preserves extraction quality while reducing degeneration cannot be assessed.

minor comments (1)

The abstract's novelty claim ('to the best of the author's knowledge, this is the first application of DPO for OCR') should be supported by a concise related-work paragraph in the introduction that cites prior preference optimization work in vision-language or document tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on evaluation transparency and reproducibility. We address each major comment below and have revised the manuscript to incorporate additional details that strengthen the claims without altering the core results.

read point-by-point responses

Referee: Abstract: The central SOTA claim (0.925/0.911 extraction scores, 0.40%/0.20% degeneration) and the reported relative degeneration reductions (up to 87.6%) rest entirely on the fairness of the unified evaluation protocol and DharmaOCR-Benchmark. The manuscript provides no details on how baselines were prompted, whether identical JSON schema enforcement and degeneration tracking were applied without post-processing, or how benchmark documents were selected and annotated. This is load-bearing because any misalignment in protocol could explain the gains rather than model superiority, as flagged in the stress-test concern on selection bias.

Authors: We agree that explicit protocol details are necessary to substantiate the SOTA claims. In the revised manuscript we have expanded Section 4 (Evaluation Protocol) to include the exact prompts and system instructions applied to every baseline, confirming uniform JSON schema enforcement and degeneration tracking with no post-processing or cherry-picking. Appendix B now details benchmark curation (public printed/handwritten sources plus expert-annotated legal documents) and provides a stress-test showing that random 50% subsamples preserve model rankings, addressing selection-bias concerns. These additions demonstrate that performance differences arise from model specialization rather than evaluation misalignment. revision: yes
Referee: Abstract: No information is given on training datasets, SFT/DPO hyperparameters (e.g., beta, learning rate, JSON enforcement strength), number of training examples, or statistical tests for the reported improvements. Without these, the reproducibility of the quality-cost trade-offs and the claim that DPO preserves extraction quality while reducing degeneration cannot be assessed.

Authors: We concur that these elements are required for full reproducibility. The revised Section 3 now contains a dedicated 'Training Details' subsection reporting: 120k SFT examples and 15k DPO preference pairs drawn from the same document distribution; SFT hyperparameters (lr=2e-5, 3 epochs, JSON loss weighting); DPO hyperparameters (beta=0.1, lr=1e-6, 1 epoch); and the constrained-decoding mechanism used for JSON enforcement. We also added paired t-tests (p<0.01) over five random seeds confirming that DPO reduces degeneration while preserving extraction scores. These details were summarized in the supplement and are now moved to the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical SOTA claims rest on external baselines and new benchmark

full rationale

The paper introduces DharmaOCR models via standard SFT + DPO training and a new benchmark with unified protocol, then reports empirical extraction scores and degeneration rates against open-source and commercial baselines. No mathematical derivation chain exists; claims do not reduce to self-defined quantities, fitted parameters renamed as predictions, or load-bearing self-citations. The benchmark and protocol are presented as methodological contributions, with performance measured externally rather than by construction. This matches the default expectation of non-circular empirical ML work.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The work rests on standard supervised fine-tuning and DPO assumptions plus the domain claim that penalizing degeneration via preference optimization will generalize; no new physical entities or ad-hoc constants are introduced beyond typical training hyperparameters.

free parameters (2)

DPO beta and learning rate
Standard DPO hyperparameters chosen during training to balance quality and degeneration reduction.
JSON schema enforcement strength
Weighting or prompting choices during SFT to enforce header-margin-footer-text structure.

axioms (1)

domain assumption Degenerate generations can be reliably identified and used as rejected examples in DPO to reduce looping without harming extraction quality.
Invoked when the paper states DPO consistently reduces degeneration rate while preserving or improving quality.

pith-pipeline@v0.9.0 · 5631 in / 1393 out tokens · 44654 ms · 2026-05-10T13:14:29.769605+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 7 canonical work pages · 2 internal anchors

[1]

An overview of the tesseract ocr engine

Ray Smith. An overview of the tesseract ocr engine. InProceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR), pages 629–633. IEEE, 2007

2007
[2]

Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016

2016
[3]

Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. Hidden technical debt in machine learning systems. InAdvances in Neural Information Processing Systems (NeurIPS), 2015

2015
[4]

Connectionist temporal clas- sification: Labelling unsegmented sequence data with recurrent neural networks

Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. Connectionist temporal clas- sification: Labelling unsegmented sequence data with recurrent neural networks. InProceedings of the 23rd International Conference on Machine Learning (ICML), pages 369–376. ACM, 2006

2006
[5]

Multimodal llms for ocr, ocr post-correction, and named entity recognition in historical documents, 2025

Gavin Greif, Niclas Griesshaber, and Robin Greif. Multimodal llms for ocr, ocr post-correction, and named entity recognition in historical documents, 2025

2025
[6]

Layoutlm: Pre-training of text and layout for document image understan- ding

Yiheng Xu, Minghao Li, Lei Cui, et al. Layoutlm: Pre-training of text and layout for document image understan- ding. InKDD, 2020

2020
[7]

Layoutlmv2: Multi-modal pre-training for visually rich document understanding

Yang Xu et al. Layoutlmv2: Multi-modal pre-training for visually rich document understanding. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021

2021
[8]

Donut: Document understanding transformer without ocr

Geewook Kim et al. Donut: Document understanding transformer without ocr. InProceedings of the European Conference on Computer Vision (ECCV), 2022

2022
[9]

Small language models (slms) can still pack a punch: A survey, 2025

Shreyas Subramanian, Vikram Elango, and Mecit Gungor. Small language models (slms) can still pack a punch: A survey, 2025. 18 APREPRINT- APRIL17, 2026

2025
[10]

Small language models are the future of domain-specific nlp.arXiv preprint arXiv:2305.04787, 2023

Zhi Zhou et al. Small language models are the future of domain-specific nlp.arXiv preprint arXiv:2305.04787, 2023

work page arXiv 2023
[11]

Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance, 2026

Branislav Pecher, Ivan Srba, and Maria Bielikova. Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance, 2026

2026
[12]

olmocr: Unlocking trillions of tokens in pdfs with vision language models.arXiv preprint arXiv:2502.18443, 2025

Jake Poznanski, Jon Borchardt, Jason Dunkelberger, Regan Huff, Daniel Lin, Aman Rangapur, Christopher Wilhelm, Kyle Lo, and Luca Soldaini. olmocr: Unlocking trillions of tokens in pdfs with vision language models. CoRR, abs/2502.18443, 2025

work page arXiv 2025
[13]

Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024

2024
[14]

olmocr 2: Unit test rewards for document ocr, 2025

Jake Poznanski, Luca Soldaini, and Kyle Lo. olmocr 2: Unit test rewards for document ocr, 2025

2025
[15]

Qwen2.5-vl technical report, 2025

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025

2025
[16]

Nanonets ocr 2: Transforming documents into llm-ready structured data

Souvik Mandal and Nanonets. Nanonets ocr 2: Transforming documents into llm-ready structured data. https:// nanonets.com/research/nanonets-ocr-2/, 2025. Research overview and implementation details; accessed 2026-02-20

2025
[17]

Deepseek-ocr: Contexts optical compression, 2025

Haoran Wei, Yaofeng Sun, and Yukun Li. Deepseek-ocr: Contexts optical compression, 2025

2025
[18]

Deepseek-ocr 2: Visual causal flow, 2026

Haoran Wei, Yaofeng Sun, and Yukun Li. Deepseek-ocr 2: Visual causal flow, 2026

2026
[19]

Zhipu AI. GLM-OCR. Hugging Face, 2025. Acessado em: março de 2026

2025
[20]

Encoder-decoder or decoder- only? revisiting encoder-decoder large language model, 2025

Biao Zhang, Yong Cheng, Siamak Shakeri, Xinyi Wang, Min Ma, and Orhan Firat. Encoder-decoder or decoder- only? revisiting encoder-decoder large language model, 2025

2025
[21]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv:2001.08361, 2020. Empirical scaling laws showing improved performance and sample-efficiency with increased model size

work page internal anchor Pith review Pith/arXiv arXiv 2001
[22]

The curious case of neural text degeneration, 2020

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration, 2020

2020
[23]

What llms think when you don’t tell them what to think about?, 2026

Yongchan Kwon and James Zou. What llms think when you don’t tell them what to think about?, 2026

2026
[24]

Neural text generation with unlikelihood training, 2019

Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston. Neural text generation with unlikelihood training, 2019

2019
[25]

Understanding the repeat curse in large language models from a feature perspective

Junchi Yao, Shu Yang, Jianhua Xu, Lijie Hu, Mengdi Li, and Di Wang. Understanding the repeat curse in large language models from a feature perspective. InFindings of the Association for Computational Linguistics: ACL 2025, page 7787–7815. Association for Computational Linguistics, 2025

2025
[26]

Dongkyu Lee, Gyeonghun Kim, Janghoon Han, Taesuk Hong, Yi-Reun Kim, Stanley Jungkyu Choi, and Nevin L. Zhang. Local temperature beam search: Avoid neural text DeGeneration via enhanced calibration. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 9903–9915, Toronto, ...

2023
[27]

Repetition in repetition out: Towards understanding neural text degeneration from the data perspective, 2023

Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, and Yixuan Su. Repetition in repetition out: Towards understanding neural text degeneration from the data perspective, 2023

2023
[28]

A theoretical analysis of the repetition problem in text generation, 2021

Zihao Fu, Wai Lam, Anthony Man-Cho So, and Bei Shi. A theoretical analysis of the repetition problem in text generation, 2021

2021
[29]

Mitigating the language mismatch and repetition issues in llm-based machine translation via model editing, 2024

Weichuan Wang, Zhaoyi Li, Defu Lian, Chen Ma, Linqi Song, and Ying Wei. Mitigating the language mismatch and repetition issues in llm-based machine translation via model editing, 2024

2024
[30]

Relating neural text degeneration to exposure bias, 2021

Ting-Rui Chiang and Yun-Nung Chen. Relating neural text degeneration to exposure bias, 2021

2021
[31]

Break the sequential dependency of llm inference using lookahead decoding, 2024

Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Break the sequential dependency of llm inference using lookahead decoding, 2024

2024
[32]

Queue management for slo-oriented large language model serving

Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew Kalbarczyk, and Ravishankar Iyer. Queue management for slo-oriented large language model serving. In Proceedings of the 2024 ACM Symposium on Cloud Computing, SoCC ’24, page 18–35, New York, NY , USA,

2024
[33]

19 APREPRINT- APRIL17, 2026

Association for Computing Machinery. 19 APREPRINT- APRIL17, 2026

2026
[34]

vllm: A high-throughput and memory-efficient inference engine for llms, 2026

vLLM Contributors. vllm: A high-throughput and memory-efficient inference engine for llms, 2026

2026
[35]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention, 2023

2023
[36]

Gulavani, Alexey Tumanov, and Ramachandran Ramjee

Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming throughput-latency tradeoff in llm inference with sarathi-serve, 2024

2024
[37]

Ocrbench leaderboard

Hugging Face Spaces — echo840/ocrbench-leaderboard. Ocrbench leaderboard. https://huggingface.co/ spaces/echo840/ocrbench-leaderboard, 2025. Acessado em Novembro 2025

2025
[38]

Qwen/qwen2.5-vl-7b-instruct

Qwen Team. Qwen/qwen2.5-vl-7b-instruct. https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct ,
[39]

Modelo multimodal geral selecionado para fine-tuning no presente estudo
[40]

Qwen/qwen2.5-vl-3b-instruct

Qwen Team. Qwen/qwen2.5-vl-3b-instruct. https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct ,
[41]

Versão menor do modelo multimodal geral selecionado para fine-tuning
[42]

unsloth/gemma-3-4b-it

unsloth. unsloth/gemma-3-4b-it. https://huggingface.co/unsloth/gemma-3-4b-it , 2025. Modelo multimodal geral escolhido para fine-tuning no presente estudo

2025
[43]

Qwen3-VL technical report, 2025

Qwen Team. Qwen3-VL technical report, 2025

2025
[44]

System card: Claude Opus 4 & Claude Sonnet 4

Anthropic. System card: Claude Opus 4 & Claude Sonnet 4. Technical report, Anthropic, may 2025

2025
[45]

Llama 4 maverick model card, april 2025

Meta. Llama 4 maverick model card, april 2025. Acessado em: março de 2026

2025
[46]

Gemini 2.5 pro model card

Google DeepMind. Gemini 2.5 pro model card. Technical report, Google DeepMind, june 2025. Acessado em: março de 2026

2025
[47]

Improving language understanding by generative pre-training

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. Preprint, OpenAI, 2018. Preprint (OpenAI)

2018
[48]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.CoRR, abs/2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[49]

SGDR: Stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations (ICLR), 2017

2017
[50]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov et al. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems (NeurIPS), 2023

2023
[51]

Ralph Allan Bradley and Milton E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952

1952
[52]

Qwen3-vl-235b-a22b-instruct

Qwen Team. Qwen3-vl-235b-a22b-instruct. https://huggingface.co/Qwen/ Qwen3-VL-235B-A22B-Instruct , 2025. Model card and weights on Hugging Face. Accessed: 2026- 02-24

2025
[53]

unsloth/gemma-3-27b-it

Unsloth AI. unsloth/gemma-3-27b-it. https://huggingface.co/unsloth/gemma-3-27b-it , 2025. Model card and weights on Hugging Face. Accessed: 2026-02-24

2025
[54]

Principled data selection for alignment: The hidden risks of difficult examples.CoRR, abs/2502.09650, 2025

Chengqian Gao, Haonan Li, Liu Liu, Zeke Xie, Peilin Zhao, and Zhiqiang Xu. Principled data selection for alignment: The hidden risks of difficult examples.CoRR, abs/2502.09650, 2025

work page arXiv 2025
[55]

Beyond reward margin: Rethinking and resolving likelihood displacement in diffusion models via video generation.CoRR, abs/2511.19049, 2025

Ruojun Xu, Yu Kai, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Tianxiang Zheng, and Qinhlin Lu. Beyond reward margin: Rethinking and resolving likelihood displacement in diffusion models via video generation.CoRR, abs/2511.19049, 2025

work page arXiv 2025
[56]

Integer quantization for deep learning inference: Principles and empirical evaluation

Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. Integer quantization for deep learning inference: Principles and empirical evaluation.arXiv preprint arXiv:2004.09602, 2020

work page arXiv 2004
[57]

Advances in the neural network quantization: A comprehensive review.Applied Sciences, 14(17):7445, 2024

Lu Wei, Zhong Ma, Chaojie Yang, and Qin Yao. Advances in the neural network quantization: A comprehensive review.Applied Sciences, 14(17):7445, 2024

2024
[58]

Awq: Activation-aware weight quantization for llm compression and acceleration

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for llm compression and acceleration. InProceedings of the Machine Learning and Systems (MLSys) Conference, 2024

2024
[59]

LLM Compressor, 8 2024

Red Hat AI and vLLM Project. LLM Compressor, 8 2024

2024
[60]

MBQ: Modality-balanced quantization for large vision-language models

Shiyao Li et al. MBQ: Modality-balanced quantization for large vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 20 APREPRINT- APRIL17, 2026

2025
[61]

Ocrbench: On the hidden mystery of ocr in large multimodal models, 2023

Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, and Xiang Bai. Ocrbench: On the hidden mystery of ocr in large multimodal models, 2023

2023
[62]

olmocr-bench

Allen Institute for AI. olmocr-bench. Hugging Face Datasets, 2025. Accessed: 2026-03-27

2025
[63]

Ester-pt: An evaluation suite for text recognition in portuguese

Moniele Kunrath Santos, Guilherme Bazzo, Lucas Lima de Oliveira, and Viviane Pereira Moreira. Ester-pt: An evaluation suite for text recognition in portuguese. InDocument Analysis and Recognition - ICDAR 2023: 17th International Conference, San José, CA, USA, August 21–26, 2023, Proceedings, Part III, page 366–383, Berlin, Heidelberg, 2023. Springer-Verlag

2023
[64]

Arthur F. S. Neto, Byron L. D. Bezerra, Sávio S. Araújo, W. M. A. S. Souza, K. F. Alves, M. F. Oliveira, S. V . S. Lins, H. J. F. Hazin, P. H. V . Rocha, and Alejandro H. Toselli. Bressay: A brazilian portuguese dataset for offline handwritten text recognition. InProceedings of the 18th International Conference on Document Analysis and Recognition (ICDAR)...

2024
[65]

Levenshtein

Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 1966. translated from Doklady Akademii Nauk SSSR

1966
[66]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 311–318, 2002

2002
[67]

Omnidocbench: Benchmarking diverse pdf document parsing with comprehensive annotations, 2024

Xiaobin Ouyang et al. Omnidocbench: Benchmarking diverse pdf document parsing with comprehensive annotations, 2024

2024
[68]

in progress

Wonseok Hwang et al. Disgo: A unified model for document image similarity, glyph, and ocr, 2023. 21 APREPRINT- APRIL17, 2026 A Appendix A.1 Impact of text degeneration on system performance Qwen2.5-VL-7B-Instruct [37] was served with the use of vLLM for evaluation of the impact of text degeneration on system performance. Three OCR datasets, namely, Dharma...

2023
[69]

critérios razoáveis e justos para determinados tratamentos desiguais

e dos Juizados Especiais da Fazenda Pública (Lei n. 12.153, de 2009).\nAlém disso, não se estabelece tratamento privilegiado da Fazenda Pública em detrimento dos contribuintes (sujeitos passivos), uma vez que o art. 93 do PLP n. 108, de 2024, aplica-se também ao reexame necessário.\nO princípio da isonomia ou igualdade, direito fundamental positivado no a...

2009
[70]

, "header

ed. rev. e atual. São Paulo: Malheiros, 2012, p. 89.\n^28 MELLO, Celso Antônio Bandeira de. Conteúdo jurídico do princípio da igualdade. 3. ed. atualizada. 8. tir. São Paulo: Malheiros, 2000.", "header": "DOUTRINA NACIONAL 149", "margin": null, "footer": "CASTRO, Eduardo Moreira Lima Rodrigues de. Limitação de Acesso à Segunda Instância no Processo Admini...

2012
[71]

The aggregated score was computed for each instance and each response as the arithmetic mean of the four criterion scores
[72]

All-vs-all pairing among the five responses (10 pairs per instance) was generated for each instance, yielding the 237 260 candidate pairs previously reported
[73]

A multi-stage filtering policy, summarized in Table 6 and detailed below. In general terms, it pursues two complementary objectives, namely, (i) to ensure the included pairs provide an instructive signal for preference learning (maximize signal / reduce noise) and (ii) to avoid pairs that induce optimization conflicts or probability-shift effects, as iden...

2026
[74]

were used for token-priced APIs for estimating an average per-page cost, which was then multiplied by one million to obtaining the cost per million pages. Denoting the average numbers of input and output tokens per page by nin andn out and the corresponding prices per million tokens byp in andp out one has C1M = 106 nin pin 106 + nout pout 106 (6) 34 APRE...

2026