Informatics for Food Processing
Pith reviewed 2026-05-22 13:25 UTC · model grok-4.3
The pith
Machine learning models can classify food processing levels at scale from nutrient data and descriptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A random forest model trained on nutrient composition data infers processing levels and yields a continuous FPro score, while language models embed food descriptions and ingredient lists for prediction; when applied to the Open Food Facts database these multimodal methods classify foods at scale and supply a reproducible alternative to subjective frameworks such as NOVA, Nutri-Score, and SIGA.
What carries the argument
FoodProX random forest model that maps nutrient composition to a continuous processing score, augmented by BERT embeddings of text descriptions for handling incomplete records.
If this is right
- Automated labeling removes the need for repeated manual review when updating large food databases.
- Continuous scores allow finer-grained statistical analysis than the usual discrete categories.
- Models remain usable even when some nutrient or text fields are missing from a product record.
- Classification can be rerun quickly whenever the underlying database is updated.
Where Pith is reading between the lines
- The same pipeline could be tested on national dietary survey data to check consistency with existing processing estimates.
- Integration with purchase or consumption records might reveal how processing level correlates with actual intake patterns.
- Retraining the model on regional or branded products could expose whether current scores transfer across markets.
- Linking the resulting scores to longitudinal health records would test whether the inferred levels track known health associations more closely than older categorical systems.
Load-bearing premise
The nutrient values and written descriptions stored in databases such as Open Food Facts are complete enough and accurate enough to train models that generalize without adding new systematic errors.
What would settle it
A direct comparison of model-assigned processing levels against classifications performed independently by several human experts on the same set of several hundred foods, with agreement measured by percentage match or Cohen's kappa.
Figures
read the original abstract
This chapter explores the evolution, classification, and health implications of food processing, while emphasizing the transformative role of machine learning, artificial intelligence (AI), and data science in advancing food informatics. It begins with a historical overview and a critical review of traditional classification frameworks such as NOVA, Nutri-Score, and SIGA, highlighting their strengths and limitations, particularly the subjectivity and reproducibility challenges that hinder epidemiological research and public policy. To address these issues, the chapter presents novel computational approaches, including FoodProX, a random forest model trained on nutrient composition data to infer processing levels and generate a continuous FPro score. It also explores how large language models like BERT and BioBERT can semantically embed food descriptions and ingredient lists for predictive tasks, even in the presence of missing data. A key contribution of the chapter is a novel case study using the Open Food Facts database, showcasing how multimodal AI models can integrate structured and unstructured data to classify foods at scale, offering a new paradigm for food processing assessment in public health and research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reviews historical and traditional food processing classification frameworks such as NOVA, Nutri-Score, and SIGA, noting their subjectivity and reproducibility limitations for epidemiological research. It introduces computational approaches including the FoodProX random forest model trained on nutrient composition data to infer processing levels and output a continuous FPro score, alongside BERT and BioBERT models for semantically embedding food descriptions and ingredient lists. A case study on the Open Food Facts database demonstrates multimodal integration of structured and unstructured data for large-scale food classification.
Significance. If the models prove accurate and generalizable, the work could offer a scalable, reproducible alternative to subjective classification systems, enabling better integration of food processing data into public health research and policy.
major comments (2)
- [Case study] The case study description states that FoodProX is trained on nutrient composition data to infer processing levels from the Open Food Facts database, yet no performance metrics, cross-validation results, confusion matrices, or baseline comparisons are provided to support the claim of reliable inference.
- [FoodProX model] NOVA categories are defined by the extent and purpose of industrial processing (e.g., addition of cosmetic additives, extrusion) rather than final nutrient vectors; the manuscript does not address how the random forest handles cases where two products share nearly identical nutrient profiles but differ in unlisted additives or processing methods.
minor comments (2)
- Clarify the handling of missing or noisy ingredient strings when applying BERT embeddings.
- Specify how ground-truth NOVA labels were assigned or validated for the training set.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We appreciate the opportunity to clarify and strengthen the presentation of the case study and the FoodProX model.
read point-by-point responses
-
Referee: [Case study] The case study description states that FoodProX is trained on nutrient composition data to infer processing levels from the Open Food Facts database, yet no performance metrics, cross-validation results, confusion matrices, or baseline comparisons are provided to support the claim of reliable inference.
Authors: We acknowledge that the manuscript as currently written does not report quantitative performance metrics, cross-validation results, confusion matrices, or baseline comparisons for FoodProX within the case study section. To address this gap, we will add these evaluations in the revised manuscript, including 5-fold cross-validation accuracy, precision-recall metrics, and comparisons against simpler baselines such as logistic regression and k-nearest neighbors on the same nutrient feature set. revision: yes
-
Referee: [FoodProX model] NOVA categories are defined by the extent and purpose of industrial processing (e.g., addition of cosmetic additives, extrusion) rather than final nutrient vectors; the manuscript does not address how the random forest handles cases where two products share nearly identical nutrient profiles but differ in unlisted additives or processing methods.
Authors: The referee correctly identifies that NOVA is based on processing purpose and methods rather than nutrient composition alone. FoodProX treats nutrient vectors as a statistical proxy for processing level, trained on products where NOVA labels are available. We will revise the manuscript to include an explicit limitations subsection discussing the risk of misclassification for products with similar nutrient profiles but differing unlisted additives or processing steps. We will also note how the multimodal BERT component on ingredient lists is intended to complement the nutrient-based model in such ambiguous cases. revision: yes
Circularity Check
No circularity: standard supervised ML on external database labels
full rationale
The paper presents FoodProX as a random forest trained on nutrient composition data from Open Food Facts to infer processing levels and produce an FPro score, alongside BERT embeddings for ingredient lists. This follows a conventional supervised learning pipeline where the model is fitted to pre-existing external annotations (e.g., NOVA categories) rather than deriving outputs from self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations. No derivation chain reduces to its own inputs by construction, and the approach remains self-contained against external benchmarks without smuggling ansatzes or renaming known results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Nutrient profiles and ingredient text contain sufficient signal to predict processing level
Reference graph
Works this paper leans on
-
[1]
1 Hall KD, Ayuketah A, Brychta R, Cai H, Cassimatis T, Chen KY et al. Ultra-Processed Diets Cause Excess Calorie Intake and Weight Gain: An Inpatient Randomized Controlled Trial of Ad Libitum Food Intake. Cell Metab 2019; 30: 67-77.e3. 2 Mendoza K, Smith-Warner SA, Rossato SL, Khandpur N, Manson JE, Qi L et al. Ultra- processed foods and cardiovascular di...
work page 2019
-
[2]
10 History Channel, Randle A. Who Invented the TV Dinner? 2021.https://www.history.com/news/tv-dinner-history-inventor (accessed 16 Feb2025). 11 Our World in Data, Giattino C, Ortiz-Ospina E, Roser M. Working Hours. 2020.https://ourworldindata.org/working-hours (accessed 16 Feb2025). 12 Moss M. Salt Sugar Fat: How the Food Giants Hooked Us. Randam House: ...
work page 2021
-
[3]
19 Jones JM. Food processing: criteria for dietary guidance and public health? Proceedings of the Nutrition Society 2019; 78: 4–18. 20 USApple. Apples and Wax Backgrounder. 2024.https://usapple.org/news-resources/apples-and- wax-backgrounder (accessed 16 Feb2025). 21 Monteiro CA. Nutrition and health. The issue is not food, nor nutrients, so much as proce...
work page 2019
-
[4]
23 Sharma LL, Teret SP, Brownell KD
World Nutrition 2016; 7: 28–38. 23 Sharma LL, Teret SP, Brownell KD. The Food Industry and Self-Regulation: Standards to Promote Success and to Avoid Public Health Failures. Am J Public Health 2010; 100: 240–246. 24 Pomeranz JL, Broad Leib EM, Mozaffarian D. Regulation of Added Substances in the Food Supply by the Food and Drug Administration Human Foods ...
work page 2016
-
[5]
27 Vandevijvere S, Jaacks LM, Monteiro CA, Moubarac J, Girling‐Butcher M, Lee AC et al. Global trends in ultraprocessed food and drink product sales and their association with adult body mass index trajectories. Obesity Reviews 2019; 20: 10–19. 28 Delpino FM, Figueiredo LM, Bielemann RM, da Silva BGC, dos Santos FS, Mintem GC et al. Ultra-processed food a...
work page 2019
-
[6]
31 Clark SE, Hawkes C, Murphy SME, Hansen-Kuhn KA, Wallinga D
doi:10.1111/obr.13366. 31 Clark SE, Hawkes C, Murphy SME, Hansen-Kuhn KA, Wallinga D. Exporting obesity: US farm and trade policy and the transformation of the Mexican consumer food environment. Int J Occup Environ Health 2012; 18: 53–64. 32 Milanlouei S, Menichetti G, Li Y, Loscalzo J, Willett WC, Barabási A-L. A systematic comprehensive longitudinal eva...
-
[7]
Low-glycemic index diets as an intervention for diabetes: a systematic review and meta-analysis
33 Zafar MI, Mills KE, Zheng J, Regmi A, Hu SQ, Gou L et al. Low-glycemic index diets as an intervention for diabetes: a systematic review and meta-analysis. Am J Clin Nutr 2019; 110: 891–902. 34 Machado PP, Steele EM, Levy RB, Sui Z, Rangan A, Woods J et al. Ultra-processed foods and recommended intake levels of nutrients linked to non-communicable disea...
work page 2019
-
[8]
36 Darcey VL, Guo J, Chi M, Chung ST, Courville AB, Gallagher I et al. Brain dopamine responses to ultra-processed milkshakes are highly variable and not significantly related to adiposity in humans. Cell Metab 2025; 37: 616-628.e5. 37 Robinson E, Johnstone AM. Ultraprocessed food (UPF), health, and mechanistic uncertainty: What should we be advising the ...
work page 2025
-
[9]
The food matrix: implications in processing, nutrition and health
39 Aguilera JM. The food matrix: implications in processing, nutrition and health. Crit Rev Food Sci Nutr 2019; 59: 3612–3629. 40 Parada J, Aguilera JM. Food microstructure affects the bioavailability of several nutrients. J Food Sci
work page 2019
-
[10]
41 Berry SE, Tydeman EA, Lewis HB, Phalora R, Rosborough J, Picout DR et al
doi:10.1111/j.1750-3841.2007.00274.x. 41 Berry SE, Tydeman EA, Lewis HB, Phalora R, Rosborough J, Picout DR et al. Manipulation of lipid bioaccessibility of almond seeds influences postprandial lipemia in healthy human subjects. Am J Clin Nutr 2008; 88: 922–929. 42 Grassby T, Mandalari G, Grundy MM-L, Edwards CH, Bisignano C, Trombetta D et al. In vitro a...
-
[11]
Postprandial glycaemic dips predict appetite and energy intake in healthy individuals
44 Wyatt P, Berry SE, Finlayson G, O’Driscoll R, Hadjigeorgiou G, Drew DA et al. Postprandial glycaemic dips predict appetite and energy intake in healthy individuals. Nat Metab 2021; 3: 523–529. 45 Corbin KD, Carnero EA, Dirks B, Igudesman D, Yi F, Marcus A et al. Host-diet-gut microbiome interactions influence human energy balance: a randomized clinical...
work page 2021
-
[12]
Direct impact of commonly used dietary emulsifiers on human gut microbiota
46 Naimi S, Viennois E, Gewirtz AT, Chassaing B. Direct impact of commonly used dietary emulsifiers on human gut microbiota. Microbiome 2021; 9:
work page 2021
-
[13]
Personalized microbiome-driven effects of non-nutritive sweeteners on human glucose tolerance
47 Suez J, Cohen Y, Valdés-Mas R, Mor U, Dori-Bachash M, Federici S et al. Personalized microbiome-driven effects of non-nutritive sweeteners on human glucose tolerance. Cell 2022; 185: 3307-3328.e19. 48 Kwon YH, Banskota S, Wang H, Rossi L, Grondin JA, Syed SA et al. Chronic exposure to synthetic food colorant Allura Red AC promotes susceptibility to exp...
work page 2022
-
[14]
doi:10.5114/aoms/125001. 50 Rifai L, Saleh FA. A Review on Acrylamide in Food: Occurrence, Toxicity, and Mitigation Strategies. Int J Toxicol 2020; 39: 93–102. 51 Martínez Steele E, Khandpur N, da Costa Louzada ML, Monteiro CA. Association between dietary contribution of ultra-processed foods and urinary concentrations of phthalates and bisphenol in a nat...
-
[15]
Ultra-Processed Foods: Definitions and Policy Issues
57 Gibney MJ. Ultra-Processed Foods: Definitions and Policy Issues. Curr Dev Nutr 2019; 3: nzy077. 58 Braesco V, Souchon I, Sauvant P, Haurogné T, Maillot M, Féart C et al. Ultra-processed foods: how functional is the NOVA system? Eur J Clin Nutr 2022; 76: 1245–1253. 59 Mialon M, Serodio P, Scagliusi FB. Criticism of the NOVA classification: who are the p...
work page 2019
-
[16]
62 Vandevijvere S, Jaacks LM, Monteiro CA, Moubarac J, Girling‐Butcher M, Lee AC et al
doi:10.1111/1750-3841.70039. 62 Vandevijvere S, Jaacks LM, Monteiro CA, Moubarac J, Girling‐Butcher M, Lee AC et al. Global trends in ultraprocessed food and drink product sales and their association with adult body mass index trajectories. Obesity Reviews 2019; 20: 10–19. 63 Baldridge AS, Huffman MD, Taylor F, Xavier D, Bright B, Van Horn L V. et al. The...
-
[17]
64 Katidi A, Vlassopoulos A, Noutsos S, Kapsokefalou M. Ultra-Processed Foods in the Mediterranean Diet according to the NOVA Classification System; A Food Level Analysis of Branded Foods in Greece. Foods 2023; 12:
work page 2023
-
[18]
A Fit-for-Purpose Nutrient Profiling Model to Underpin Food and Nutrition Policies in South Africa
65 Frank T, Thow A-M, Ng SW, Ostrowski J, Bopape M, Swart EC. A Fit-for-Purpose Nutrient Profiling Model to Underpin Food and Nutrition Policies in South Africa. Nutrients 2021; 13:
work page 2021
-
[19]
67 Qian F, Riddle MC, Wylie-Rosett J, Hu FB
doi:10.1038/s43016-024-01095-7. 67 Qian F, Riddle MC, Wylie-Rosett J, Hu FB. Red and Processed Meats and Health Risks: How Strong Is the Evidence? Diabetes Care 2020; 43: 265–271. 68 Feinstein MJ, Hsue PY, Benjamin LA, Bloomfield GS, Currier JS, Freiberg MS et al. Characteristics, Prevention, and Management of Cardiovascular Disease in People Living With ...
-
[20]
69 Ludwig DS, Willett WC, Putt ME
doi:10.1161/CIR.0000000000000695. 69 Ludwig DS, Willett WC, Putt ME. Wash-in and washout effects: mitigating bias in short term dietary and other trials. BMJ 2025; : e082963. 70 Menichetti G, Barabási A-L, Loscalzo J. Chemical Complexity of Food and Implications for Therapeutics. New England Journal of Medicine 2025; 392: 1836–1845. 71 Chantal J, Hercberg...
-
[21]
LanguaL Food Description: a Learning Process
82 Ireland JD, Møller A. LanguaL Food Description: a Learning Process. Eur J Clin Nutr 2010; 64: S44–S48. 83 Durazzo A, Camilli E, D’Addezio L, Sette S, Marconi S, Piccinelli R et al. Italian composite dishes: description and classification by LanguaLTM and FoodEx2. European Food Research and Technology 2020; 246: 287–295. 84 Durazzo A, D’Andrea T, Gabrie...
work page 2010
-
[22]
Food Composition Databases: Does It Matter to Human Health? Nutrients 2021; 13:
85 Delgado A, Issaoui M, Vieira MC, Saraiva de Carvalho I, Fardet A. Food Composition Databases: Does It Matter to Human Health? Nutrients 2021; 13:
work page 2021
-
[23]
86 Slimani N, Deharveng G, Unwin I, Southgate DAT, Vignat J, Skeie G et al. The EPIC nutrient database project (ENDB): a first attempt to standardize nutrient databases across the 10 European countries participating in the EPIC study. Eur J Clin Nutr 2007; 61: 1037–1056. 87 The European Food Safety Authority. The Food Classification and Description System...
-
[24]
91 Dooley DM, Griffiths EJ, Gosal GS, Buttigieg PL, Hoehndorf R, Lange MC et al. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. NPJ Sci Food 2018; 2:
work page 2018
-
[25]
Department of Agriculture, Agricultural Research Service
92 U.S. Department of Agriculture, Agricultural Research Service. FoodData Central. 2019.fdc.nal.usda.gov. 93 Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC et al. The Ontology for Biomedical Investigations. PLoS One 2016; 11: e0154556. 94 Menichetti G, Ravandi B, Mozaffarian D, Barabási A-L. Machine learning prediction of the degre...
work page 2019
-
[26]
A partnership for public health: USDA branded food products database
95 Kretser A, Murphy D, Starke-Reed P. A partnership for public health: USDA branded food products database. Journal of Food Composition and Analysis 2017; 64: 10–12. 96 Harvard T.H. Chan School of Public Health. Nutrition Questionnaire Service Center. 2022.https://hsph.harvard.edu/department/nutrition/nutrition-questionnaire-service- center/#nutrient-dat...
work page 2017
-
[27]
100 Menichetti G, Barabási A-L
doi:10.1371/journal.pcbi.1002166. 100 Menichetti G, Barabási A-L. Nutrient concentrations in food display universal behaviour. Nat Food 2022; 3: 375–382. 101 Menichetti G, Barabási A-L, Loscalzo J. Decoding the Foodome: Molecular Networks Connecting Diet and Health. Annu Rev Nutr 2024; 44: 257–288. 102 Martínez Steele E, Baraldi LG, Louzada ML da C, Mouba...
-
[28]
Context-Driven Missing Data Imputation via Large Language Model
108 Lim J, An S, Woo G, Kim C, Jeon J-J. Context-Driven Missing Data Imputation via Large Language Model. 2025.https://openreview.net/forum?id=b2oLgk5XRE. 109 Open Food Facts. Open Food Facts. 2025.https://world.openfoodfacts.org (accessed 5 Mar2025). 110 Sarda B, Kesse-Guyot E, Deschamps V, Ducrot P, Galan P, Hercberg S et al. Complementarity between the...
work page 2025
-
[29]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
112 Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North. Association for Computational Linguistics: Stroudsburg, PA, USA, 2019, pp 4171–4186. 113 Lee J, Yoon W, Kim S, Kim D, Kim S, So CH et al. BioBERT: a pre-trained biomedical langu...
work page 2019
-
[30]
115 Elbiach O, Grissette H, Nfaoui EH. Leveraging Transformer Models for Enhanced Pharmacovigilance: A Comparative Analysis of ADR Extraction from Biomedical and Social Media Texts. AI 2025; 6:
work page 2025
-
[31]
BioBERT and Similar Approaches for Relation Extraction
116 Bhasuran B. BioBERT and Similar Approaches for Relation Extraction. 2022, pp 221–235. 117 Pellegrini C, Özsoy E, Wintergerst M, Groh G. Exploiting Food Embeddings for Ingredient Substitution. In: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021). SCITEPRESS, 2021, pp 67–77. 118 Lo ...
work page 2022
-
[32]
119 What We Eat In America (WWEIA) Database
doi:https://doi.org/10.48550/arXiv.2312.08592. 119 What We Eat In America (WWEIA) Database. 2024.https://data.nal.usda.gov/dataset/what-we- eat-america-wweia-database (accessed 5 Mar2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.