pith. sign in

arxiv: 2604.09927 · v1 · submitted 2026-04-10 · 💻 cs.CV

BLPR: Robust License Plate Recognition under Viewpoint and Illumination Variations via Confidence-Driven VLM Fallback

Pith reviewed 2026-05-10 16:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords license plate recognitionYOLO detectorsynthetic datadomain adaptationvision language modelBolivian datasetviewpoint variationillumination robustness
0
0 comments X

The pith

BLPR reaches 89.6% character accuracy on real Bolivian license plates by pretraining a YOLO detector on Blender synthetic images and invoking a Gemma3 VLM only on low-confidence cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents BLPR, a two-stage license plate detection and recognition system built for Bolivian plates that face unique visual challenges and limited training data. A YOLO detector is first pretrained on images synthesized in Blender to cover extreme viewpoints and lighting, then fine-tuned on real street captures from La Paz, with plates geometrically rectified before character recognition. When the main recognizer shows low , a Gemma3 4B vision-language model is called as a fallback. The work also releases the first public Bolivian LPDR dataset. This combination addresses data scarcity and environmental variation so that recognition can be deployed reliably in cities where standard models fail.

Core claim

The BLPR framework pretrains a YOLO-based detector on Blender-generated synthetic Bolivian plates to simulate viewpoint and illumination extremes, fine-tunes it on collected La Paz street data, applies geometric rectification, and selectively triggers a Gemma3 4B vision-language model as a confidence-driven fallback, yielding 89.6% character-level accuracy on real-world test data via synthetic-to-real domain adaptation.

What carries the argument

The confidence-driven VLM fallback that selectively invokes Gemma3 4B only on low-confidence outputs from the YOLO pipeline after geometric rectification of detected plates.

If this is right

  • The pipeline reaches 89.6% character-level recognition accuracy on real Bolivian urban data.
  • Synthetic pretraining plus domain adaptation improves robustness to viewpoint distortion and illumination change.
  • The first public Bolivian LPDR dataset enables standardized evaluation under diverse conditions.
  • Selective VLM invocation adds robustness in ambiguous cases while limiting added compute.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same synthetic-plus-selective-VLM pattern could be tested on license plates from other countries that also have distinctive designs and scarce labeled data.
  • Tuning the threshold that triggers the VLM could trade accuracy against latency for real-time traffic cameras.
  • Extending the Blender simulation to include motion blur or partial occlusion might further reduce the need for additional real-world collection.

Load-bearing premise

Blender-generated synthetic images sufficiently capture the real viewpoint and illumination variations in La Paz street data, and that selectively invoking the Gemma3 VLM on low-confidence cases improves net accuracy without adding errors or latency.

What would settle it

If disabling the Gemma3 VLM fallback on the real-world La Paz test set produces equal or higher than 89.6% character accuracy, or if accuracy on held-out real plates drops sharply when synthetic pretraining is removed.

Figures

Figures reproduced from arXiv: 2604.09927 by Diego Calvimontes Vera, Edwin Salcedo, Guillermo Auza Banegas, Natalia Condori Peredo, Sergio Castro Sandoval.

Figure 1
Figure 1. Figure 1: Physical details of the Bolivian license plate defined [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Analysis of anomalous license plate conditions in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Perspective variations from the BLPR-B dataset, annotated with the corresponding angles (in degrees) of each viewpoint. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Samples from the BLPR-B dataset with different [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-character distribution in BLPR-C before and after [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: BLPR system proposed for license plate recognition under varying viewpoints and illumination conditions. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results on challenging license plate samples from the BLPR-D dataset, comparing a raw detector-only [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Robust license plate recognition in unconstrained environments remains a significant challenge, particularly in underrepresented regions with limited data availability and unique visual characteristics, such as Bolivia. Recognition accuracy in real-world conditions is often degraded by factors such as illumination changes and viewpoint distortion. To address these challenges, we introduce BLPR, a novel deep learning-based License Plate Detection and Recognition (LPDR) framework specifically designed for Bolivian license plates. The proposed system follows a two-stage pipeline where a YOLO-based detector is pretrained on synthetic data generated in Blender to simulate extreme perspectives and lighting conditions, and subsequently fine-tuned on street-level data collected in La Paz, Bolivia. Detected plates are geometrically rectified and passed to a character recognition model. To improve robustness under ambiguous scenarios, a lightweight vision-language model (Gemma3 4B) is selectively triggered as a confidence-based fallback mechanism. The proposed framework further leverages synthetic-to-real domain adaptation to improve robustness under diverse real-world conditions. We also introduce the first publicly available Bolivian LPDR dataset, enabling evaluation under diverse viewpoint and illumination conditions. The system achieves a character-level recognition accuracy of 89.6% on real-world data, demonstrating its effectiveness for deployment in challenging urban environments. Our project is publicly available at https://github.com/EdwinTSalcedo/BLPR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces BLPR, a two-stage LPDR pipeline for Bolivian license plates that pretrains a YOLO detector on Blender-generated synthetic images simulating extreme viewpoints and illumination, fine-tunes it on real La Paz street data, applies geometric rectification, performs character recognition, and selectively invokes a Gemma3-4B VLM as a confidence-based fallback. It claims 89.6% character-level accuracy on real-world data and releases the first public Bolivian LPDR dataset.

Significance. If the accuracy claim is substantiated with proper controls, the release of the Bolivian dataset would be a tangible contribution for an underrepresented region, and the synthetic-pretraining plus selective-VLM strategy could offer a pragmatic route to robustness under viewpoint/illumination shifts. The work is otherwise an incremental engineering application of existing components (YOLO, synthetic data, VLM fallback) whose incremental value remains unquantified.

major comments (3)
  1. [Abstract / Experiments] Abstract and Experiments section: the headline 89.6% character accuracy is stated without test-set cardinality, number of plates or images evaluated, baseline comparisons, per-character error breakdown, or statistical tests, so the central empirical claim cannot be assessed or attributed to the proposed mechanisms.
  2. [Method] Method section (VLM fallback): no ablation, accuracy delta, or latency measurement is supplied for the confidence-triggered Gemma3 invocation versus the base recognizer, nor is the trigger threshold calibrated or justified on held-out data.
  3. [Experiments / Dataset] Experiments / Dataset sections: no domain-shift quantification (FID, MMD, or similar) between Blender synthetic plates and La Paz real plates is reported, leaving the claim that synthetic pretraining plus domain adaptation closes the gap unsupported.
minor comments (2)
  1. [Method] The description of the geometric rectification step should include the exact transformation used and any failure cases observed on real images.
  2. [Dataset] Dataset release statement should specify license, exact split sizes, and annotation protocol for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where additional details and analyses will strengthen the manuscript. We address each major point below and will incorporate the requested information and experiments in the revised version.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: the headline 89.6% character accuracy is stated without test-set cardinality, number of plates or images evaluated, baseline comparisons, per-character error breakdown, or statistical tests, so the central empirical claim cannot be assessed or attributed to the proposed mechanisms.

    Authors: We agree that the current presentation of the 89.6% character-level accuracy lacks sufficient context for proper evaluation. In the revised manuscript we will explicitly state the test-set cardinality (number of images and distinct plates), add comparisons to relevant LPDR baselines, include a per-character error breakdown, and report statistical significance tests. These additions will allow readers to assess the claim and attribute gains to the synthetic pretraining, geometric rectification, and VLM fallback components. revision: yes

  2. Referee: [Method] Method section (VLM fallback): no ablation, accuracy delta, or latency measurement is supplied for the confidence-triggered Gemma3 invocation versus the base recognizer, nor is the trigger threshold calibrated or justified on held-out data.

    Authors: We acknowledge that the VLM fallback mechanism is not yet supported by ablation results or threshold justification. We will add an ablation comparing the base recognizer against the full pipeline with confidence-triggered Gemma3-4B fallback, report the resulting accuracy delta, measure the associated latency overhead, and describe the procedure used to select and validate the confidence threshold on held-out data. revision: yes

  3. Referee: [Experiments / Dataset] Experiments / Dataset sections: no domain-shift quantification (FID, MMD, or similar) between Blender synthetic plates and La Paz real plates is reported, leaving the claim that synthetic pretraining plus domain adaptation closes the gap unsupported.

    Authors: We agree that explicit quantification of the domain gap would better support the synthetic-pretraining strategy. In the revision we will compute and report domain-discrepancy metrics (FID and/or MMD) between the Blender-generated synthetic plates and the real La Paz images, thereby providing quantitative evidence that the pretraining plus fine-tuning pipeline reduces the distribution shift. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical accuracy on held-out real data

full rationale

The paper describes a two-stage DL pipeline (YOLO detector pretrained on Blender synthetics then fine-tuned on La Paz street data, plus selective Gemma3 VLM fallback) and reports a measured character-level accuracy of 89.6% on real-world held-out data. No equations, fitted parameters, derivations, or self-citations appear in the provided text that would reduce this accuracy figure to the training inputs by construction. The result is an externally measured performance metric on unseen real images, not a self-referential prediction or renamed input.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The performance claim rests on the assumption that synthetic Blender data transfers effectively to real Bolivian conditions and that the VLM fallback adds value only when needed. No new physical entities are postulated.

free parameters (1)
  • VLM trigger confidence threshold
    The cutoff used to decide when to invoke Gemma3 is a tunable hyperparameter whose exact value and tuning procedure are not stated.
axioms (2)
  • domain assumption Blender synthetic images sufficiently approximate real-world viewpoint and illumination variations for domain adaptation
    Invoked to justify pretraining the YOLO detector before fine-tuning on La Paz data.
  • domain assumption The collected La Paz street-level images are representative of target deployment conditions
    Used both for fine-tuning and for the reported 89.6% accuracy evaluation.

pith-pipeline@v0.9.0 · 5557 in / 1598 out tokens · 101581 ms · 2026-05-10T16:55:19.844040+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    (2025) Autos ‘chutos’, contrabando y miner ´ıa ilegal incrementan la demanda de combustible

    Red Uno. (2025) Autos ‘chutos’, contrabando y miner ´ıa ilegal incrementan la demanda de combustible. Consultado el 30 de junio de

  2. [2]

    Available: https://shorturl.at/d5XXk

    [Online]. Available: https://shorturl.at/d5XXk

  3. [3]

    (2025, April) En bolivia hay un promedio de 57 denuncias de robo al d ´ıa

    El Pa ´ıs Bolivia. (2025, April) En bolivia hay un promedio de 57 denuncias de robo al d ´ıa. Accessed: 2026-03-

  4. [4]

    [Online]. Available: https://elpais.bo/seguridad/20250414 en-bolivia-hay-un-promedio-de-57-denuncias-de-robo-al-dia.html 11 TABLE VI: Detailed ablation results on the BLPR-D dataset across different perspective categories (Normal, Tilted, and Steep) and illumination levels (Low, Medium, and High). The results highlight the contribution of each module unde...

  5. [5]

    Automatic number plate recognition system (anpr): A survey,

    C. Patel, D. Shah, and A. Patel, “Automatic number plate recognition system (anpr): A survey,”International Journal of Computer Applications, vol. 69, no. 9, pp. 21–25, May 2013, accessed April 24, 2025. [Online]. Available: https://www.researchgate.net/ publication/236888959

  6. [6]

    Automatic number plate recognition system: comparison of commercial software and implementation using deep learning,

    K. Salminen, “Automatic number plate recognition system: comparison of commercial software and implementation using deep learning,”

  7. [7]

    Available: https://lutpub.lut.fi/handle/10024/167724

    [Online]. Available: https://lutpub.lut.fi/handle/10024/167724

  8. [8]

    Sequence recognition of chinese license plates,

    J. Wang, H. Huang, X. Qian, J. Cao, and Y . Dai, “Sequence recognition of chinese license plates,”Neurocomputing, vol. 317, pp. 149–158, 2018

  9. [9]

    A robust real-time automatic license plate recognition based on the yolo detector,

    R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Gonc ¸alves, W. R. Schwartz, and D. Menotti, “A robust real-time automatic license plate recognition based on the yolo detector,” in2018 international joint conference on neural networks (ijcnn). IEEE, 2018, pp. 1–10

  10. [10]

    System implementation of multiple license plate detection and correction on wide-angle images using an instance segmentation network model,

    H.-Y . Lin, Y .-Q. Li, and D.-T. Lin, “System implementation of multiple license plate detection and correction on wide-angle images using an instance segmentation network model,”IEEE Transactions on Consumer Electronics, vol. 70, no. 1, pp. 4425–4434, 2023

  11. [11]

    Towards end-to-end license plate detection and recognition: A large dataset and baseline,

    Z. Xu, W. Yang, A. Meng, N. Wang, L. Huang, and S. Lin, “Towards end-to-end license plate detection and recognition: A large dataset and baseline,” inComputer Vision – ECCV 2018, ser. Lecture Notes in Computer Science, V . Ferrari, M. Hebert, C. Sminchisescu, and Y . Weiss, Eds., vol. 11217. Cham: Springer, 2018, pp. 255–271

  12. [12]

    Recognizing license plates in real-time,

    X. Yang and X. Wang, “Recognizing license plates in real-time,” Nov. 2022

  13. [13]

    A morphology based approach for car license plate extraction,

    P. Suryanarayana, S. Mitra, A. Banerjee, and A. Roy, “A morphology based approach for car license plate extraction,” in2005 Annual IEEE India Conference - Indicon, 2005, pp. 24–27

  14. [14]

    License plate recognition based on mathematical morphology and template matching,

    G. Lin, B. Xue, B. Xu, and C. Chen, “License plate recognition based on mathematical morphology and template matching,” in2019 Chinese Automation Congress (CAC), 2019, pp. 405–410

  15. [15]

    A deep reinforcement learning approach to character segmentation of license plate images,

    F. Abtahi, Z. Zhu, and A. M. Burry, “A deep reinforcement learning approach to character segmentation of license plate images,” in2015 14th IAPR International Conference on Machine Vision Applications (MVA), 2015, pp. 539–542

  16. [16]

    Designing and evaluating a hybrid framework for improved accuracy and efficiency in indian license plate recognition,

    M. C. Lokesh and N. Kumar, “Designing and evaluating a hybrid framework for improved accuracy and efficiency in indian license plate recognition,”International Journal of Intelligent Systems and Applications in Engineering, vol. 11, no. 1s, pp. 271–274, 2023

  17. [17]

    License plate detection and recognition in unconstrained scenarios,

    S. M. Silva and C. R. Jung, “License plate detection and recognition in unconstrained scenarios,” inComputer Vision – ECCV 2018, V . Ferrari, M. Hebert, C. Sminchisescu, and Y . Weiss, Eds. Cham: Springer International Publishing, 2018, pp. 593–609

  18. [18]

    Automatic vehicle 12 license plate recognition system based on image processing and template matching approach,

    K. Yogheedha, A. Nasir, H. Jaafar, and S. Mamduh, “Automatic vehicle 12 license plate recognition system based on image processing and template matching approach,” in2018 International Conference on Computa- tional Approach in Smart Systems Design and Applications (ICASSDA), 2018, pp. 1–8

  19. [19]

    A vehicle license plate recognition algorithm in night based on hsv,

    W. Feng and S. Gao, “A vehicle license plate recognition algorithm in night based on hsv,” in2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 4, 2010, pp. V4–53– V4–56

  20. [20]

    Improving licence plate detection using generative adversarial networks,

    A. Boby and D. Brown, “Improving licence plate detection using generative adversarial networks,” inPattern Recognition and Image Analysis. Cham: Springer International Publishing, 2022, pp. 588–601

  21. [21]

    Towards automatic license plate recognition in challenging conditions,

    F. Sultan, K. Khan, Y . A. Shah, M. Shahzad, U. Khan, and Z. Mahmood, “Towards automatic license plate recognition in challenging conditions,”Applied Sciences, vol. 13, no. 6, 2023. [Online]. Available: https://www.mdpi.com/2076-3417/13/6/3956

  22. [22]

    A robust and efficient approach to license plate detection,

    Y . Yuanet al., “A robust and efficient approach to license plate detection,”IEEE Transactions on Intelligent Transportation Systems, 2016

  23. [23]

    Application-oriented license plate recognition,

    G.-S. Hsu, J.-C. Chen, and Y .-Z. Chung, “Application-oriented license plate recognition,”IEEE Transactions on Vehicular Technology, 2013

  24. [24]

    Towards end-to-end license plate detection and recognition: A large dataset and baseline,

    Z. Xu, W. Yang, A. Meng, N. Lu, H. Huang, C. Ying, and L. Huang, “Towards end-to-end license plate detection and recognition: A large dataset and baseline,” inComputer Vision – ECCV 2018. Springer, 2018, pp. 261–277

  25. [25]

    Research and application on license plate detection in foggy weather,

    P. Liu and W. Yao, “Research and application on license plate detection in foggy weather,” in2024 4th International Symposium on Computer Technology and Information Science (ISCTIS), 2024, pp. 403–406

  26. [26]

    Effects of challenging weather and illumination on learning- based license plate detection in noncontrolled environments,

    A. Rio-Alvarez, J. Andr ´es, M. Gonzalez-Rodriguez, D. Lanvin, and B. Lopez, “Effects of challenging weather and illumination on learning- based license plate detection in noncontrolled environments,”Scientific Programming, vol. 2019, pp. 1–16, 07 2019

  27. [27]

    China-balanced-license-plate-recognition-dataset-330k,

    S. L, “China-balanced-license-plate-recognition-dataset-330k,” https:// github.com/SunlifeV/CBLPRD-330k, 2023

  28. [28]

    On the cross-dataset generalization in license plate recognition,

    R. Laroca, E. V . Cardoso, D. R. Lucio, V . Estevam, and D. Menotti, “On the cross-dataset generalization in license plate recognition,”arXiv preprint arXiv:2201.00267, 2022

  29. [29]

    A robust real-time automatic license plate recognition based on the YOLO detector,

    R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Gonc ¸alves, W. R. Schwartz, and D. Menotti, “A robust real-time automatic license plate recognition based on the YOLO detector,” inInternational Joint Conference on Neural Networks (IJCNN), July 2018, pp. 1–10

  30. [30]

    Artificial mercosur license plates dataset,

    G. Silvano, I. Silva, V . Ribeiro, V . Greati, A. Bezerra, P. Endo, and T. Lynn, “Artificial mercosur license plates dataset,”Data in Brief, vol. 33, p. 106554, 12 2020

  31. [31]

    Application-oriented license plate recognition,

    G.-S. Hsu, J.-C. Chen, and Y .-Z. Chung, “Application-oriented license plate recognition,”IEEE Transactions on Vehicular Technology, 2012

  32. [32]

    Benchmark for license plate character segmentation,

    G. Gonc ¸alves, S. Silva, D. Menotti, and W. Schwartz, “Benchmark for license plate character segmentation,”Journal of Electronic Imaging, 09 2016

  33. [33]

    Brazilian mercosur license plate detection: a deep learning approach relying on synthetic imagery,

    V . Ribeiro, V . Greati, A. Bezerra, G. Silvano, I. Silva, P. T. Endo, and T. Lynn, “Brazilian mercosur license plate detection: a deep learning approach relying on synthetic imagery,” in2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC). IEEE, 2019, pp. 1–8

  34. [34]

    A robust attentional framework for license plate recognition in the wild,

    L. Zhang, P. Wang, H. Li, Z. Li, C. Shen, and Y . Zhang, “A robust attentional framework for license plate recognition in the wild,”arXiv preprint arXiv:2006.03919, 2020, arXiv:2006.03919v2

  35. [35]

    License plate images generation with diffusion models,

    M. Shpira, N. Shvaia, and A. Nakib, “License plate images generation with diffusion models,” inProceedings of the 27th European Confer- ence on Artificial Intelligence (ECAI 2024), ser. Frontiers in Artificial Intelligence and Applications, U. Endrisset al., Eds., vol. 4594. IOS Press, 2024, open Access under CC BY-NC 4.0

  36. [36]

    A first look at dataset bias in license plate recognition,

    R. Laroca, M. Santos, V . Estevam, E. Luz, and D. Menotti, “A first look at dataset bias in license plate recognition,” inConference on Graphics, Patterns and Images (SIBGRAPI), 2022

  37. [37]

    Advancing multinational license plate recognition through synthetic and real data fusion: A comprehensive evaluation,

    R. Larocaet al., “Advancing multinational license plate recognition through synthetic and real data fusion: A comprehensive evaluation,” IEEE Transactions on Intelligent Transportation Systems, 2025, early access / in press

  38. [38]

    Comparing ocr pipelines for folkloristic text digitization,

    O. M. Machidon and A. L. Machidon, “Comparing ocr pipelines for folkloristic text digitization,” inDigital Heritage 2025, S. Campana, D. Ferdani, H. Graf, G. Guidi, Z. Hegarty, S. Pescarin, and F. Re- mondino, Eds., 2025, arXiv preprint arXiv:2507.19092v1

  39. [39]

    Clocr-c: Context leveraging ocr correction with pre- trained language models,

    J. Bourne, “Clocr-c: Context leveraging ocr correction with pre- trained language models,”arXiv preprint arXiv:2408.17428, 2024, arXiv:2408.17428v1

  40. [40]

    Deep learning-based license plate recognition in complex environments,

    Z. Dai, “Deep learning-based license plate recognition in complex environments,” inProceedings of the 2nd International Conference on Mechanics, Electronics Engineering and Automation (ICMEEA 2025), ser. Advances in Engineering Research. Atlantis Press, 2025, pp. 280–

  41. [41]

    Available: https://doi.org/10.2991/978-94-6463-821-9 31

    [Online]. Available: https://doi.org/10.2991/978-94-6463-821-9 31

  42. [42]

    (2026) Ruat

    RUAT - Registro ´Unico para la Administraci ´on Tributaria Municipal. (2026) Ruat. Accessed 30 March 2026. [Online]. Available: https: //www.ruat.gob.bo/Principal.jsf VIII. APPENDIX A. Geometric Rectification Algorithm Algorithm 2Geometric Rectification Require:ROI. Ensure:RectifiedROI 1:procedureGEOMETRICRECTIFICATION(ROI) 2:P olygon←ExtractLargestQuadri...