Recognition: unknown
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
Pith reviewed 2026-05-10 13:14 UTC · model grok-4.3
The pith
Specialized 7B and 3B language models reach state-of-the-art structured OCR by using schema fine-tuning plus preference optimization to cut degeneration.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that the first application of Direct Preference Optimization to OCR, treating degenerate generations as rejected examples, combined with Supervised Fine-Tuning to enforce a strict JSON schema for header, margin, footer, and text fields, produces DharmaOCR Full (7B) and DharmaOCR Lite (3B) models. These reach extraction quality scores of 0.925 and 0.911 on the DharmaOCR-Benchmark with degeneration rates of 0.40 percent and 0.20 percent, outperforming every open-source and commercial baseline evaluated, while AWQ quantization further reduces per-page cost by up to 22 percent with negligible quality loss.
What carries the argument
Direct Preference Optimization (DPO) that uses degenerate OCR outputs as rejected preferences to discourage looping behavior, paired with Supervised Fine-Tuning (SFT) that enforces a fixed JSON schema for document structure.
If this is right
- Lower degeneration rates directly reduce average response time and raise throughput in production OCR pipelines.
- Quantized versions of the models deliver up to 22 percent lower per-page inference cost while preserving extraction quality.
- Tracking degeneration as a first-class metric alongside quality reveals hidden computational costs that standard OCR evaluations miss.
- The same SFT-plus-DPO recipe works across model scales, delivering gains for both the 7B and 3B variants.
Where Pith is reading between the lines
- The same preference optimization against repetition could improve stability in other narrow-domain generation tasks such as report or form filling.
- A public release of the benchmark would allow direct comparison of future OCR models on the same structured-document distribution.
- These compact models suggest that domain-specific fine-tuning can close much of the gap to larger general-purpose systems for well-defined extraction problems.
- Extending the approach to additional languages or document layouts would test how far the observed quality-cost gains generalize.
Load-bearing premise
The authors' benchmark documents and unified evaluation protocol represent real-world structured OCR tasks without selection bias that favors their particular schema and training setup.
What would settle it
Test the same models on an independent collection of structured printed, handwritten, and legal documents drawn from sources outside the benchmark and training data, then measure whether the reported quality scores and degeneration rates remain unchanged.
Figures
read the original abstract
This manuscript introduces DharmaOCR Full and Lite, a pair of specialized small language models (SSLMs) for structured OCR that jointly optimize transcription quality, generation stability, and inference cost. It also presents DharmaOCR-Benchmark, a benchmark that covers printed, handwritten, and legal/administrative documents, and proposes a unified evaluation protocol that measures fidelity and structure while explicitly tracking text degeneration as a first-class benchmark metric (alongside unit cost). Beyond reporting degeneration rates, the manuscript empirically shows degeneration is not merely a quality failure, since it materially worsens production performance by increasing response time, reducing throughput, and inflating computational cost due to abnormally long generations. To the best of the author's knowledge, as a methodological contribution, this is the first application of Direct Preference Optimization (DPO) for OCR, explicitly using degenerate generations as rejected examples to penalize looping behavior. Combined with Supervised Fine-Tuning (SFT) for enforcing a strict JSON schema (header, margin, footer, and text), DPO consistently reduces degeneration rate across model families (up to 87.6% relative) while preserving or improving extraction quality. The resulting models, namely, DharmaOCR Full (7B) and DharmaOCR Lite (3B), set a new state-of-the-art on DharmaOCR-Benchmark, outperforming each open-source and commercial baseline model evaluated regarding extraction quality, reaching 0.925 and 0.911 scores with 0.40% and 0.20% degeneration rates. AWQ quantization reduced up to 22% per-page cost with negligible quality loss, enabling a strong quality-cost trade-off in comparison to proprietary OCR APIs and open-source alternatives.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This manuscript claims to introduce DharmaOCR Full (7B) and DharmaOCR Lite (3B) as specialized small language models for structured OCR. These models are optimized using Supervised Fine-Tuning (SFT) with a strict JSON schema for elements like header, margin, footer, and text, combined with Direct Preference Optimization (DPO) applied to degenerate generations to reduce looping behavior. The paper also introduces the DharmaOCR-Benchmark covering printed, handwritten, and legal/administrative documents, and a unified evaluation protocol that assesses extraction quality, structure, and degeneration rates as a key metric. It reports that the models achieve state-of-the-art performance with extraction scores of 0.925 and 0.911, degeneration rates of 0.40% and 0.20%, outperforming open-source and commercial baselines. The work further shows that degeneration increases response time and cost, and that AWQ quantization can reduce per-page costs by up to 22% with negligible quality loss.
Significance. Assuming the results are based on a fair and reproducible evaluation, the significance lies in providing evidence that small LLMs can be effectively specialized for high-quality structured OCR with low degeneration using DPO, which is claimed to be the first such application. The new benchmark and protocol, along with the analysis of degeneration's impact on production metrics, could advance the field by highlighting stability as a critical factor alongside accuracy. The cost-quality trade-offs demonstrated could inform practical deployments in document processing pipelines. Credit is due for the empirical demonstration of DPO's benefits in this domain and the introduction of degeneration tracking.
major comments (2)
- Abstract: The central SOTA claim (0.925/0.911 extraction scores, 0.40%/0.20% degeneration) and the reported relative degeneration reductions (up to 87.6%) rest entirely on the fairness of the unified evaluation protocol and DharmaOCR-Benchmark. The manuscript provides no details on how baselines were prompted, whether identical JSON schema enforcement and degeneration tracking were applied without post-processing, or how benchmark documents were selected and annotated. This is load-bearing because any misalignment in protocol could explain the gains rather than model superiority, as flagged in the stress-test concern on selection bias.
- Abstract: No information is given on training datasets, SFT/DPO hyperparameters (e.g., beta, learning rate, JSON enforcement strength), number of training examples, or statistical tests for the reported improvements. Without these, the reproducibility of the quality-cost trade-offs and the claim that DPO preserves extraction quality while reducing degeneration cannot be assessed.
minor comments (1)
- The abstract's novelty claim ('to the best of the author's knowledge, this is the first application of DPO for OCR') should be supported by a concise related-work paragraph in the introduction that cites prior preference optimization work in vision-language or document tasks.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive feedback on evaluation transparency and reproducibility. We address each major comment below and have revised the manuscript to incorporate additional details that strengthen the claims without altering the core results.
read point-by-point responses
-
Referee: Abstract: The central SOTA claim (0.925/0.911 extraction scores, 0.40%/0.20% degeneration) and the reported relative degeneration reductions (up to 87.6%) rest entirely on the fairness of the unified evaluation protocol and DharmaOCR-Benchmark. The manuscript provides no details on how baselines were prompted, whether identical JSON schema enforcement and degeneration tracking were applied without post-processing, or how benchmark documents were selected and annotated. This is load-bearing because any misalignment in protocol could explain the gains rather than model superiority, as flagged in the stress-test concern on selection bias.
Authors: We agree that explicit protocol details are necessary to substantiate the SOTA claims. In the revised manuscript we have expanded Section 4 (Evaluation Protocol) to include the exact prompts and system instructions applied to every baseline, confirming uniform JSON schema enforcement and degeneration tracking with no post-processing or cherry-picking. Appendix B now details benchmark curation (public printed/handwritten sources plus expert-annotated legal documents) and provides a stress-test showing that random 50% subsamples preserve model rankings, addressing selection-bias concerns. These additions demonstrate that performance differences arise from model specialization rather than evaluation misalignment. revision: yes
-
Referee: Abstract: No information is given on training datasets, SFT/DPO hyperparameters (e.g., beta, learning rate, JSON enforcement strength), number of training examples, or statistical tests for the reported improvements. Without these, the reproducibility of the quality-cost trade-offs and the claim that DPO preserves extraction quality while reducing degeneration cannot be assessed.
Authors: We concur that these elements are required for full reproducibility. The revised Section 3 now contains a dedicated 'Training Details' subsection reporting: 120k SFT examples and 15k DPO preference pairs drawn from the same document distribution; SFT hyperparameters (lr=2e-5, 3 epochs, JSON loss weighting); DPO hyperparameters (beta=0.1, lr=1e-6, 1 epoch); and the constrained-decoding mechanism used for JSON enforcement. We also added paired t-tests (p<0.01) over five random seeds confirming that DPO reduces degeneration while preserving extraction scores. These details were summarized in the supplement and are now moved to the main text. revision: yes
Circularity Check
No significant circularity; empirical SOTA claims rest on external baselines and new benchmark
full rationale
The paper introduces DharmaOCR models via standard SFT + DPO training and a new benchmark with unified protocol, then reports empirical extraction scores and degeneration rates against open-source and commercial baselines. No mathematical derivation chain exists; claims do not reduce to self-defined quantities, fitted parameters renamed as predictions, or load-bearing self-citations. The benchmark and protocol are presented as methodological contributions, with performance measured externally rather than by construction. This matches the default expectation of non-circular empirical ML work.
Axiom & Free-Parameter Ledger
free parameters (2)
- DPO beta and learning rate
- JSON schema enforcement strength
axioms (1)
- domain assumption Degenerate generations can be reliably identified and used as rejected examples in DPO to reduce looping without harming extraction quality.
Reference graph
Works this paper leans on
-
[1]
An overview of the tesseract ocr engine
Ray Smith. An overview of the tesseract ocr engine. InProceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR), pages 629–633. IEEE, 2007
2007
-
[2]
Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016
2016
-
[3]
Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. Hidden technical debt in machine learning systems. InAdvances in Neural Information Processing Systems (NeurIPS), 2015
2015
-
[4]
Connectionist temporal clas- sification: Labelling unsegmented sequence data with recurrent neural networks
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. Connectionist temporal clas- sification: Labelling unsegmented sequence data with recurrent neural networks. InProceedings of the 23rd International Conference on Machine Learning (ICML), pages 369–376. ACM, 2006
2006
-
[5]
Multimodal llms for ocr, ocr post-correction, and named entity recognition in historical documents, 2025
Gavin Greif, Niclas Griesshaber, and Robin Greif. Multimodal llms for ocr, ocr post-correction, and named entity recognition in historical documents, 2025
2025
-
[6]
Layoutlm: Pre-training of text and layout for document image understan- ding
Yiheng Xu, Minghao Li, Lei Cui, et al. Layoutlm: Pre-training of text and layout for document image understan- ding. InKDD, 2020
2020
-
[7]
Layoutlmv2: Multi-modal pre-training for visually rich document understanding
Yang Xu et al. Layoutlmv2: Multi-modal pre-training for visually rich document understanding. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021
2021
-
[8]
Donut: Document understanding transformer without ocr
Geewook Kim et al. Donut: Document understanding transformer without ocr. InProceedings of the European Conference on Computer Vision (ECCV), 2022
2022
-
[9]
Small language models (slms) can still pack a punch: A survey, 2025
Shreyas Subramanian, Vikram Elango, and Mecit Gungor. Small language models (slms) can still pack a punch: A survey, 2025. 18 APREPRINT- APRIL17, 2026
2025
-
[10]
Small language models are the future of domain-specific nlp.arXiv preprint arXiv:2305.04787, 2023
Zhi Zhou et al. Small language models are the future of domain-specific nlp.arXiv preprint arXiv:2305.04787, 2023
-
[11]
Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance, 2026
Branislav Pecher, Ivan Srba, and Maria Bielikova. Comparing specialised small and general large language models on text classification: 100 labelled samples to achieve break-even performance, 2026
2026
-
[12]
Jake Poznanski, Jon Borchardt, Jason Dunkelberger, Regan Huff, Daniel Lin, Aman Rangapur, Christopher Wilhelm, Kyle Lo, and Luca Soldaini. olmocr: Unlocking trillions of tokens in pdfs with vision language models. CoRR, abs/2502.18443, 2025
-
[13]
Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024
2024
-
[14]
olmocr 2: Unit test rewards for document ocr, 2025
Jake Poznanski, Luca Soldaini, and Kyle Lo. olmocr 2: Unit test rewards for document ocr, 2025
2025
-
[15]
Qwen2.5-vl technical report, 2025
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2025
2025
-
[16]
Nanonets ocr 2: Transforming documents into llm-ready structured data
Souvik Mandal and Nanonets. Nanonets ocr 2: Transforming documents into llm-ready structured data. https:// nanonets.com/research/nanonets-ocr-2/, 2025. Research overview and implementation details; accessed 2026-02-20
2025
-
[17]
Deepseek-ocr: Contexts optical compression, 2025
Haoran Wei, Yaofeng Sun, and Yukun Li. Deepseek-ocr: Contexts optical compression, 2025
2025
-
[18]
Deepseek-ocr 2: Visual causal flow, 2026
Haoran Wei, Yaofeng Sun, and Yukun Li. Deepseek-ocr 2: Visual causal flow, 2026
2026
-
[19]
Zhipu AI. GLM-OCR. Hugging Face, 2025. Acessado em: março de 2026
2025
-
[20]
Encoder-decoder or decoder- only? revisiting encoder-decoder large language model, 2025
Biao Zhang, Yong Cheng, Siamak Shakeri, Xinyi Wang, Min Ma, and Orhan Firat. Encoder-decoder or decoder- only? revisiting encoder-decoder large language model, 2025
2025
-
[21]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv:2001.08361, 2020. Empirical scaling laws showing improved performance and sample-efficiency with increased model size
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[22]
The curious case of neural text degeneration, 2020
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration, 2020
2020
-
[23]
What llms think when you don’t tell them what to think about?, 2026
Yongchan Kwon and James Zou. What llms think when you don’t tell them what to think about?, 2026
2026
-
[24]
Neural text generation with unlikelihood training, 2019
Sean Welleck, Ilia Kulikov, Stephen Roller, Emily Dinan, Kyunghyun Cho, and Jason Weston. Neural text generation with unlikelihood training, 2019
2019
-
[25]
Understanding the repeat curse in large language models from a feature perspective
Junchi Yao, Shu Yang, Jianhua Xu, Lijie Hu, Mengdi Li, and Di Wang. Understanding the repeat curse in large language models from a feature perspective. InFindings of the Association for Computational Linguistics: ACL 2025, page 7787–7815. Association for Computational Linguistics, 2025
2025
-
[26]
Dongkyu Lee, Gyeonghun Kim, Janghoon Han, Taesuk Hong, Yi-Reun Kim, Stanley Jungkyu Choi, and Nevin L. Zhang. Local temperature beam search: Avoid neural text DeGeneration via enhanced calibration. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 9903–9915, Toronto, ...
2023
-
[27]
Repetition in repetition out: Towards understanding neural text degeneration from the data perspective, 2023
Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, and Yixuan Su. Repetition in repetition out: Towards understanding neural text degeneration from the data perspective, 2023
2023
-
[28]
A theoretical analysis of the repetition problem in text generation, 2021
Zihao Fu, Wai Lam, Anthony Man-Cho So, and Bei Shi. A theoretical analysis of the repetition problem in text generation, 2021
2021
-
[29]
Mitigating the language mismatch and repetition issues in llm-based machine translation via model editing, 2024
Weichuan Wang, Zhaoyi Li, Defu Lian, Chen Ma, Linqi Song, and Ying Wei. Mitigating the language mismatch and repetition issues in llm-based machine translation via model editing, 2024
2024
-
[30]
Relating neural text degeneration to exposure bias, 2021
Ting-Rui Chiang and Yun-Nung Chen. Relating neural text degeneration to exposure bias, 2021
2021
-
[31]
Break the sequential dependency of llm inference using lookahead decoding, 2024
Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Break the sequential dependency of llm inference using lookahead decoding, 2024
2024
-
[32]
Queue management for slo-oriented large language model serving
Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew Kalbarczyk, and Ravishankar Iyer. Queue management for slo-oriented large language model serving. In Proceedings of the 2024 ACM Symposium on Cloud Computing, SoCC ’24, page 18–35, New York, NY , USA,
2024
-
[33]
19 APREPRINT- APRIL17, 2026
Association for Computing Machinery. 19 APREPRINT- APRIL17, 2026
2026
-
[34]
vllm: A high-throughput and memory-efficient inference engine for llms, 2026
vLLM Contributors. vllm: A high-throughput and memory-efficient inference engine for llms, 2026
2026
-
[35]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention, 2023
2023
-
[36]
Gulavani, Alexey Tumanov, and Ramachandran Ramjee
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming throughput-latency tradeoff in llm inference with sarathi-serve, 2024
2024
-
[37]
Ocrbench leaderboard
Hugging Face Spaces — echo840/ocrbench-leaderboard. Ocrbench leaderboard. https://huggingface.co/ spaces/echo840/ocrbench-leaderboard, 2025. Acessado em Novembro 2025
2025
-
[38]
Qwen/qwen2.5-vl-7b-instruct
Qwen Team. Qwen/qwen2.5-vl-7b-instruct. https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct ,
-
[39]
Modelo multimodal geral selecionado para fine-tuning no presente estudo
-
[40]
Qwen/qwen2.5-vl-3b-instruct
Qwen Team. Qwen/qwen2.5-vl-3b-instruct. https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct ,
-
[41]
Versão menor do modelo multimodal geral selecionado para fine-tuning
-
[42]
unsloth/gemma-3-4b-it
unsloth. unsloth/gemma-3-4b-it. https://huggingface.co/unsloth/gemma-3-4b-it , 2025. Modelo multimodal geral escolhido para fine-tuning no presente estudo
2025
-
[43]
Qwen3-VL technical report, 2025
Qwen Team. Qwen3-VL technical report, 2025
2025
-
[44]
System card: Claude Opus 4 & Claude Sonnet 4
Anthropic. System card: Claude Opus 4 & Claude Sonnet 4. Technical report, Anthropic, may 2025
2025
-
[45]
Llama 4 maverick model card, april 2025
Meta. Llama 4 maverick model card, april 2025. Acessado em: março de 2026
2025
-
[46]
Gemini 2.5 pro model card
Google DeepMind. Gemini 2.5 pro model card. Technical report, Google DeepMind, june 2025. Acessado em: março de 2026
2025
-
[47]
Improving language understanding by generative pre-training
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. Preprint, OpenAI, 2018. Preprint (OpenAI)
2018
-
[48]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.CoRR, abs/2106.09685, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[49]
SGDR: Stochastic gradient descent with warm restarts
Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts. InInternational Conference on Learning Representations (ICLR), 2017
2017
-
[50]
Direct preference optimization: Your language model is secretly a reward model
Rafael Rafailov et al. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[51]
Ralph Allan Bradley and Milton E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324–345, 1952
1952
-
[52]
Qwen3-vl-235b-a22b-instruct
Qwen Team. Qwen3-vl-235b-a22b-instruct. https://huggingface.co/Qwen/ Qwen3-VL-235B-A22B-Instruct , 2025. Model card and weights on Hugging Face. Accessed: 2026- 02-24
2025
-
[53]
unsloth/gemma-3-27b-it
Unsloth AI. unsloth/gemma-3-27b-it. https://huggingface.co/unsloth/gemma-3-27b-it , 2025. Model card and weights on Hugging Face. Accessed: 2026-02-24
2025
-
[54]
Chengqian Gao, Haonan Li, Liu Liu, Zeke Xie, Peilin Zhao, and Zhiqiang Xu. Principled data selection for alignment: The hidden risks of difficult examples.CoRR, abs/2502.09650, 2025
-
[55]
Ruojun Xu, Yu Kai, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Tianxiang Zheng, and Qinhlin Lu. Beyond reward margin: Rethinking and resolving likelihood displacement in diffusion models via video generation.CoRR, abs/2511.19049, 2025
-
[56]
Integer quantization for deep learning inference: Principles and empirical evaluation
Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. Integer quantization for deep learning inference: Principles and empirical evaluation.arXiv preprint arXiv:2004.09602, 2020
-
[57]
Advances in the neural network quantization: A comprehensive review.Applied Sciences, 14(17):7445, 2024
Lu Wei, Zhong Ma, Chaojie Yang, and Qin Yao. Advances in the neural network quantization: A comprehensive review.Applied Sciences, 14(17):7445, 2024
2024
-
[58]
Awq: Activation-aware weight quantization for llm compression and acceleration
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for llm compression and acceleration. InProceedings of the Machine Learning and Systems (MLSys) Conference, 2024
2024
-
[59]
LLM Compressor, 8 2024
Red Hat AI and vLLM Project. LLM Compressor, 8 2024
2024
-
[60]
MBQ: Modality-balanced quantization for large vision-language models
Shiyao Li et al. MBQ: Modality-balanced quantization for large vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 20 APREPRINT- APRIL17, 2026
2025
-
[61]
Ocrbench: On the hidden mystery of ocr in large multimodal models, 2023
Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, and Xiang Bai. Ocrbench: On the hidden mystery of ocr in large multimodal models, 2023
2023
-
[62]
olmocr-bench
Allen Institute for AI. olmocr-bench. Hugging Face Datasets, 2025. Accessed: 2026-03-27
2025
-
[63]
Ester-pt: An evaluation suite for text recognition in portuguese
Moniele Kunrath Santos, Guilherme Bazzo, Lucas Lima de Oliveira, and Viviane Pereira Moreira. Ester-pt: An evaluation suite for text recognition in portuguese. InDocument Analysis and Recognition - ICDAR 2023: 17th International Conference, San José, CA, USA, August 21–26, 2023, Proceedings, Part III, page 366–383, Berlin, Heidelberg, 2023. Springer-Verlag
2023
-
[64]
Arthur F. S. Neto, Byron L. D. Bezerra, Sávio S. Araújo, W. M. A. S. Souza, K. F. Alves, M. F. Oliveira, S. V . S. Lins, H. J. F. Hazin, P. H. V . Rocha, and Alejandro H. Toselli. Bressay: A brazilian portuguese dataset for offline handwritten text recognition. InProceedings of the 18th International Conference on Document Analysis and Recognition (ICDAR)...
2024
-
[65]
Levenshtein
Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 1966. translated from Doklady Akademii Nauk SSSR
1966
-
[66]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 311–318, 2002
2002
-
[67]
Omnidocbench: Benchmarking diverse pdf document parsing with comprehensive annotations, 2024
Xiaobin Ouyang et al. Omnidocbench: Benchmarking diverse pdf document parsing with comprehensive annotations, 2024
2024
-
[68]
in progress
Wonseok Hwang et al. Disgo: A unified model for document image similarity, glyph, and ocr, 2023. 21 APREPRINT- APRIL17, 2026 A Appendix A.1 Impact of text degeneration on system performance Qwen2.5-VL-7B-Instruct [37] was served with the use of vLLM for evaluation of the impact of text degeneration on system performance. Three OCR datasets, namely, Dharma...
2023
-
[69]
critérios razoáveis e justos para determinados tratamentos desiguais
e dos Juizados Especiais da Fazenda Pública (Lei n. 12.153, de 2009).\nAlém disso, não se estabelece tratamento privilegiado da Fazenda Pública em detrimento dos contribuintes (sujeitos passivos), uma vez que o art. 93 do PLP n. 108, de 2024, aplica-se também ao reexame necessário.\nO princípio da isonomia ou igualdade, direito fundamental positivado no a...
2009
-
[70]
, "header
ed. rev. e atual. São Paulo: Malheiros, 2012, p. 89.\n^28 MELLO, Celso Antônio Bandeira de. Conteúdo jurídico do princípio da igualdade. 3. ed. atualizada. 8. tir. São Paulo: Malheiros, 2000.", "header": "DOUTRINA NACIONAL 149", "margin": null, "footer": "CASTRO, Eduardo Moreira Lima Rodrigues de. Limitação de Acesso à Segunda Instância no Processo Admini...
2012
-
[71]
The aggregated score was computed for each instance and each response as the arithmetic mean of the four criterion scores
-
[72]
All-vs-all pairing among the five responses (10 pairs per instance) was generated for each instance, yielding the 237 260 candidate pairs previously reported
-
[73]
A multi-stage filtering policy, summarized in Table 6 and detailed below. In general terms, it pursues two complementary objectives, namely, (i) to ensure the included pairs provide an instructive signal for preference learning (maximize signal / reduce noise) and (ii) to avoid pairs that induce optimization conflicts or probability-shift effects, as iden...
2026
-
[74]
were used for token-priced APIs for estimating an average per-page cost, which was then multiplied by one million to obtaining the cost per million pages. Denoting the average numbers of input and output tokens per page by nin andn out and the corresponding prices per million tokens byp in andp out one has C1M = 106 nin pin 106 + nout pout 106 (6) 34 APRE...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.