HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction
Pith reviewed 2026-05-15 20:41 UTC · model grok-4.3
The pith
HalalBench provides the first multilingual OCR benchmark for food packaging ingredient labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present HalalBench as the first open benchmark for multilingual OCR on food packaging, containing 1,043 images with 36,438 annotations, demonstrating that current engines struggle particularly with dense text, small fonts, and non-Latin scripts like Japanese.
What carries the argument
HalalBench dataset of real and synthetic food packaging images annotated in COCO format for ingredient text across 14 languages, used to benchmark OCR engines and test post-processing.
If this is right
- Popular OCR engines achieve F1 scores of 0.193 or lower on the benchmark.
- A clustering-based post-processing step improves F1 scores by 36%.
- All engines fail completely on Japanese text with F1 of 0.000.
- The benchmark supports development of OCR for automated halal food verification.
Where Pith is reading between the lines
- This benchmark could help improve OCR accuracy for small text on curved surfaces in consumer products.
- Real-world deployment in halal scanners may benefit from the identified weaknesses in current technology.
- Future work could expand the dataset with more real images to better match production conditions.
Load-bearing premise
The synthetic images are representative enough of real food packaging challenges like curved surfaces and tiny fonts.
What would settle it
Running the same OCR engines on hundreds of additional real food packaging photos and finding substantially higher or lower accuracy than reported on HalalBench.
Figures
read the original abstract
No standardized benchmark exists for evaluating OCR on food packaging, despite its critical role in automated halal food verification. Existing benchmarks target documents or scene text, missing the unique challenges of ingredient labels: curved surfaces, dense multilingual text, and sub-8pt fonts. We present HalalBench, the first open multilingual benchmark for food packaging OCR, comprising 1,043 images (50 real, 993 synthetic) with 36,438 annotations in COCO format spanning 14 languages. We evaluate four engines: docTR achieves F1=0.193, ML Kit 0.180, EasyOCR 0.167, while all fail on Japanese (F1=0.000). A clustering ablation shows 36% F1 improvement from our post-processing algorithm. We validate findings through HalalLens (https://halallens.no), a production halal scanner serving 20+ countries. Dataset and code are released under open licenses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces HalalBench, the first open multilingual OCR benchmark for food packaging ingredient extraction. It comprises 1,043 images (50 real, 993 synthetic) with 36,438 COCO-format annotations spanning 14 languages. The authors evaluate four OCR engines (docTR F1=0.193, ML Kit 0.180, EasyOCR 0.167, all failing on Japanese), demonstrate a 36% F1 improvement via clustering-based post-processing, and validate via the HalalLens production scanner serving 20+ countries. Dataset and code are released openly.
Significance. If the synthetic images accurately model real packaging distortions, this benchmark would address a genuine gap in OCR evaluation for practical domains like automated halal verification. The open data release and empirical baselines against existing engines are positive contributions that could support future method development. The low absolute F1 scores highlight task difficulty, but the work's value hinges on benchmark representativeness.
major comments (1)
- [Abstract] Abstract: The claim that the 993 synthetic images represent real-world challenges (curved surfaces, dense multilingual text, sub-8pt fonts) is load-bearing for the reported F1 scores and 36% post-processing gain, yet no quantitative validation such as distribution matching on curvature, text density, or font-size histograms is provided between the 50 real and 993 synthetic images.
minor comments (1)
- [Abstract] Abstract: The statement 'We validate findings through HalalLens' lacks any specifics on the validation methodology, metrics, or results from the production system.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on HalalBench. We address the single major comment below and will revise the manuscript to incorporate quantitative validation of the synthetic images.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the 993 synthetic images represent real-world challenges (curved surfaces, dense multilingual text, sub-8pt fonts) is load-bearing for the reported F1 scores and 36% post-processing gain, yet no quantitative validation such as distribution matching on curvature, text density, or font-size histograms is provided between the 50 real and 993 synthetic images.
Authors: We agree that the manuscript would be strengthened by explicit quantitative validation showing that the synthetic images model the same distribution of challenges as the real ones. In the revised version we will add a dedicated subsection (and corresponding appendix figures) that reports: (1) font-size histograms computed from the COCO bounding-box heights for both sets, (2) text-density statistics (characters and words per image), and (3) curvature estimates obtained by fitting quadratic surfaces to the detected text regions. These comparisons will be presented alongside the existing qualitative examples to confirm that the synthetic generation pipeline reproduces the target real-world distortions. revision: yes
Circularity Check
No circularity: dataset release and empirical evaluation only
full rationale
The paper introduces HalalBench (1,043 images, 50 real + 993 synthetic) and reports F1 scores for four existing OCR engines plus a post-processing ablation. No equations, fitted parameters, or derivations appear in the abstract or described content. The central claim is the benchmark itself; synthetic-image fidelity is a methodological assumption but is not defined in terms of the reported results or reduced by construction to any input. No self-citation chain, uniqueness theorem, or ansatz is invoked to support a derivation. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present HalalBench, the first open multilingual benchmark for food packaging OCR, comprising 1,043 images (50 real, 993 synthetic) with 36,438 annotations in COCO format spanning 14 languages.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abdulla Alourani and Shahnawaz Khan
doi: 10.1007/ s10462-024-10834-y. Abdulla Alourani and Shahnawaz Khan. A blockchain and artificial intelligence based system for halal food traceability.arXiv preprint arXiv:2410.07305,
-
[2]
Abdulla Alourani and Shahnawaz Khan
doi: 10.48550/arXiv.2410.07305. Fatmah Y. Assiri, Maram D. Alahmadi, Maha A. Al- muashi, and Abdulrahman M. Almansour. Extract nutritional information from bilingual food labels us- ing large language models.Journal of Imaging, 11 (8):271,
-
[3]
2Repository URL to be provided upon publication
doi: 10.3390/jimaging11080271. 2Repository URL to be provided upon publication. Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, and Haoshuang Wang. PP- OCR: A practical ultra lightweight OCR system. arXiv preprint arXiv:2009.09941,
- [4]
-
[5]
Van Thuy Hoang, Tien-Bach-Thanh Do, Jinho Seo, Seung Charlie Kim, Luong Vuong Nguyen, Duong Nguyen Minh Huy, Hyeon-Ju Jeon, and O-Joun Lee. Halal or not: Knowledge graph completion for predicting cultural appropriateness of daily prod- ucts.arXiv preprint arXiv:2501.05768,
-
[6]
doi: 10.48550/arXiv.2501.05768. IMARC Group. Halal food market size, share, growth and trends analysis report, 2025–2033.https:// www.imarcgroup.com/halal-food-market,
-
[7]
Evaluating OCR per- formance on food packaging labels in South Africa
Mayimunah Nagayi, Alice Khan, Tamryn Frank, Rina Swart, and Clement Nyirenda. Evaluating OCR per- formance on food packaging labels in South Africa. InProceedings of the Southern African Conference for Artificial Intelligence Research (SACAIR 2025), volume 2784 ofCommunications in Computer and Information Science. Springer,
work page 2025
-
[8]
doi: 10.1109/ACCESS.2024.3367983. 8
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.