Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition

Aythami Morales; Blanca Lacruz-Pleguezuelos; Enrique Carrillo de Santa Pau; Guadalupe X.Baz\'an; Isabel Espinosa-Salinas; Javier Ortega-Garcia; Julian Fierrez; Laura Judith Marcos Zambrano; Ruben Tolosana; Sergio Romero-Tapiador

arxiv: 2504.06925 · v1 · pith:7IGMHAM7new · submitted 2025-04-09 · 💻 cs.CV · cs.AI

Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition

Sergio Romero-Tapiador , Ruben Tolosana , Blanca Lacruz-Pleguezuelos , Laura Judith Marcos Zambrano , Guadalupe X.Baz\'an , Isabel Espinosa-Salinas , Julian Fierrez , Javier Ortega-Garcia

show 2 more authors

Enrique Carrillo de Santa Pau Aythami Morales

This is my paper

Pith reviewed 2026-05-22 19:57 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords vision-language modelsfood image recognitiondietary assessmentExpert-Weighted RecallFoodNExTDBclosed-source modelsfine-grained classification

0 comments

The pith

Closed-source vision-language models reach over 90 percent expert-weighted recall on single-product food images and outperform open-source alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests six vision-language models on food recognition tasks that would matter for automatic dietary assessment from photos. It introduces the FoodNExTDB database of 9,263 expert-labeled images spanning categories, subcategories, and cooking styles, plus fifty thousand nutritional annotations from seven experts. The authors also define an Expert-Weighted Recall metric that adjusts scores for differences among those annotators. Results indicate closed-source models handle simple single-item cases well while all models still falter on fine details such as cooking methods or visually similar foods.

Core claim

Closed-source models such as ChatGPT, Gemini, and Claude achieve over 90 percent EWR when identifying food products in single-item images, whereas open-source models lag. The evaluation rests on the new FoodNExTDB collection and the EWR metric that incorporates inter-annotator variability. The work shows that current VLMs remain limited in fine-grained recognition of cooking styles and similar-looking items, limiting their immediate use for reliable automatic dietary assessment.

What carries the argument

The FoodNExTDB database of expert-annotated images together with the Expert-Weighted Recall metric that accounts for annotator differences when scoring model outputs at multiple levels of food detail.

If this is right

VLMs could already support basic dietary logging tools when images contain only one clear food item.
Further work on distinguishing cooking styles and similar foods would be required before broader reliability.
The public FoodNExTDB collection gives other teams a shared benchmark for testing new models.
The performance difference between closed and open models suggests practical choices in building nutrition applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Apps could pair closed-source VLMs with simple user confirmation for plates containing multiple foods.
The closed-source advantage may affect how widely accessible AI dietary tools become in the near term.
Extending the same evaluation to images from varied lighting or cultural cuisines would test whether the observed limits persist.

Load-bearing premise

The FoodNExTDB database with its expert annotations and the Expert-Weighted Recall metric form a valid and representative benchmark for how vision-language models would perform in real dietary assessment.

What would settle it

A follow-up test on the same models using non-expert labels or everyday multi-item photos that produces substantially lower EWR scores would show the benchmark overstates practical performance.

Figures

Figures reproduced from arXiv: 2504.06925 by Aythami Morales, Blanca Lacruz-Pleguezuelos, Enrique Carrillo de Santa Pau, Guadalupe X.Baz\'an, Isabel Espinosa-Salinas, Javier Ortega-Garcia, Julian Fierrez, Laura Judith Marcos Zambrano, Ruben Tolosana, Sergio Romero-Tapiador.

**Figure 1.** Figure 1: Overview of the proposed framework. (A) The FoodNExTDB consists of 9,263 food images labeled by nutrition experts across [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed Expert-Weighted Recall (EWR) computation for a food image [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of VLMs predictions compared to nutritionist’s annotations. (A) A multi-component dish where some experts identify [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Radar charts illustrating VLM performance in fine-grained food recognition. We include some examples of all available classes [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Automatic dietary assessment based on food images remains a challenge, requiring precise food detection, segmentation, and classification. Vision-Language Models (VLMs) offer new possibilities by integrating visual and textual reasoning. In this study, we evaluate six state-of-the-art VLMs (ChatGPT, Gemini, Claude, Moondream, DeepSeek, and LLaVA), analyzing their capabilities in food recognition at different levels. For the experimental framework, we introduce the FoodNExTDB, a unique food image database that contains 9,263 expert-labeled images across 10 categories (e.g., "protein source"), 62 subcategories (e.g., "poultry"), and 9 cooking styles (e.g., "grilled"). In total, FoodNExTDB includes 50k nutritional labels generated by seven experts who manually annotated all images in the database. Also, we propose a novel evaluation metric, Expert-Weighted Recall (EWR), that accounts for the inter-annotator variability. Results show that closed-source models outperform open-source ones, achieving over 90% EWR in recognizing food products in images containing a single product. Despite their potential, current VLMs face challenges in fine-grained food recognition, particularly in distinguishing subtle differences in cooking styles and visually similar food items, which limits their reliability for automatic dietary assessment. The FoodNExTDB database is publicly available at https://github.com/AI4Food/FoodNExtDB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The new FoodNExTDB dataset and EWR metric are worth noting, but the single-product results don't strongly support broad claims on dietary assessment readiness.

read the letter

The main takeaway here is the release of FoodNExTDB, a dataset with 9,263 expert-labeled food images and 50k nutritional annotations, plus the Expert-Weighted Recall metric designed to deal with differences between annotators. That combination gives the field something concrete to work with beyond just another model comparison. The paper evaluates six VLMs on food recognition at various levels of detail, from broad categories down to cooking styles. Closed-source options come out ahead with over 90 percent EWR on single-product shots, while open-source models trail. It also does a good job calling out the remaining difficulties in telling apart similar foods or subtle preparation differences. Making the database public is a plus that lets others verify or extend the work. Where it gets softer is in connecting these results to actual dietary assessment. The tests focus on images showing one item at a time, but real meals usually have several foods together, with varying angles and lighting. There's no reported check on whether strong EWR performance reduces errors in estimating calories or nutrients from full plates. The metric accounts for inter-annotator variability nicely on paper, but without showing it improves predictions on held-out real-world data, its advantage stays somewhat theoretical. Details on how the models were queried or any fine-tuning would help too, though the abstract suggests the full paper covers the framework. Readers who care about benchmarks for vision-language models in health applications will find this useful. It's a solid empirical study that adds resources rather than just claiming big advances. The argument holds because the numbers are direct measurements against expert labels, with no obvious circularity. This paper should go to peer review. Referees can push for broader scene tests and validation against actual intake data, but the core contributions are worth the time.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces FoodNExTDB, a dataset of 9,263 expert-annotated food images spanning 10 categories, 62 subcategories, and 9 cooking styles, along with 50k nutritional labels from seven experts. It evaluates six VLMs (ChatGPT, Gemini, Claude, Moondream, DeepSeek, LLaVA) on food recognition using a new Expert-Weighted Recall (EWR) metric designed to account for inter-annotator variability, reporting that closed-source models achieve over 90% EWR on single-product images while noting limitations in fine-grained recognition of cooking styles and similar items.

Significance. The public release of FoodNExTDB and the EWR metric represent a concrete contribution to benchmarking VLMs for food image tasks. If the single-product results generalize and EWR correlates with downstream dietary assessment utility, the work could help identify gaps in current models. However, the restriction to single-product images and absence of external validation against nutrient estimation errors on multi-item meals reduce the immediate applicability to real-world dietary assessment.

major comments (3)

[Abstract] The headline result (>90% EWR for closed-source models) is reported only for single-product images (Abstract), yet the introduction frames the study as addressing automatic dietary assessment, which typically involves multi-item plates; no results or analysis on multi-product images are described to support the readiness claim.
[Abstract] The EWR metric is presented as accounting for inter-annotator variability (Abstract), but no formula, weighting scheme, or comparison to standard recall is provided; without this, it is unclear whether EWR provides a meaningfully different or more robust evaluation than conventional metrics.
[Experimental framework] The dataset contains 9,263 images across 10 categories with expert annotations, but the evaluation is restricted to single-product subsets without reported data splits, error analysis by category, or correlation of EWR scores with actual nutrient intake prediction error on held-out real meals.

minor comments (2)

[Abstract] The abstract lists example categories and subcategories but does not include a summary table of image counts per category or inter-annotator agreement statistics; adding this would improve clarity of the dataset contribution.
[Abstract] The GitHub link for FoodNExTDB is provided, but the manuscript does not specify the exact license or any usage restrictions for the 50k nutritional labels.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] The headline result (>90% EWR for closed-source models) is reported only for single-product images (Abstract), yet the introduction frames the study as addressing automatic dietary assessment, which typically involves multi-item plates; no results or analysis on multi-product images are described to support the readiness claim.

Authors: We agree that real-world dietary assessment typically involves multi-item plates. Our study deliberately focuses on single-product images to provide a controlled benchmark of VLM food recognition capabilities. We will revise the abstract to explicitly qualify the >90% EWR result as applying to single-product images and expand the introduction to discuss the gap for multi-item scenarios without overstating readiness for full dietary assessment. revision: partial
Referee: [Abstract] The EWR metric is presented as accounting for inter-annotator variability (Abstract), but no formula, weighting scheme, or comparison to standard recall is provided; without this, it is unclear whether EWR provides a meaningfully different or more robust evaluation than conventional metrics.

Authors: The EWR formula and weighting scheme based on inter-annotator agreement are defined in the Methods section. We will add a brief description of the EWR formula and a direct comparison to standard recall in the abstract and results to clarify its advantages. revision: yes
Referee: [Experimental framework] The dataset contains 9,263 images across 10 categories with expert annotations, but the evaluation is restricted to single-product subsets without reported data splits, error analysis by category, or correlation of EWR scores with actual nutrient intake prediction error on held-out real meals.

Authors: We will report data splits and include error analysis by category in the revised experimental section. Correlation of EWR with nutrient intake prediction error on held-out real meals is not performed in this work, as it would require additional multi-item meal data and downstream nutrient validation experiments outside the current benchmarking scope. revision: partial

standing simulated objections not resolved

Correlation of EWR scores with actual nutrient intake prediction error on held-out real meals

Circularity Check

0 steps flagged

No significant circularity: purely empirical evaluation against external expert labels

full rationale

This is an empirical evaluation study that introduces the FoodNExTDB dataset with 50k expert-generated nutritional labels and defines the EWR metric to account for inter-annotator variability. Performance results (e.g., >90% EWR for closed-source VLMs on single-product images) are computed directly by comparing model outputs to the independent expert annotations. There are no mathematical derivations, fitted parameters renamed as predictions, self-citation load-bearing premises, uniqueness theorems, or ansatzes smuggled via citation. The work is self-contained against external benchmarks with no reduction of claims to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the validity of expert annotations as ground truth and the appropriateness of the new EWR metric for dietary assessment evaluation.

axioms (1)

domain assumption Expert manual annotations by seven experts provide accurate and consistent labels for food categories, subcategories, cooking styles, and nutritional information.
The entire evaluation framework and EWR metric depend on these labels serving as reliable ground truth.

invented entities (1)

Expert-Weighted Recall (EWR) metric no independent evidence
purpose: To evaluate model performance while accounting for inter-annotator variability among experts.
This is a novel metric introduced in the paper to handle variability in expert labels.

pith-pipeline@v0.9.0 · 5852 in / 1262 out tokens · 70195 ms · 2026-05-22T19:57:28.729648+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 8 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Food Recognition using Fusion of Classifiers Based on CNNs

Eduardo Aguilar, Marc Bola ˜nos, and Petia Radeva. Food Recognition using Fusion of Classifiers Based on CNNs. In Proc. of the International Conference on Image Analysis and Processing, 2017. 2

work page 2017
[3]

Gemini: A Family of Highly Capable Multimodal Models

Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805, 2023. 1, 4

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

The Claude 3 Model Family: Opus, Sonnet, Haiku

AI Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Claude-3 Model Card, 1:1, 2024. 4

work page 2024
[5]

Li, Adrien Bardes, Suzanne Petryk, Oscar Ma ˜nas, et al

Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C Li, Adrien Bardes, Suzanne Petryk, Oscar Ma˜nas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, et al. An Introduction to Vision-Language Modeling. arXiv preprint arXiv:2405.17247, 2024. 2

work page arXiv 2024
[6]

Food-101 – Mining Discriminative Components with Ran- dom Forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – Mining Discriminative Components with Ran- dom Forests. In Proc. of the European Conference on Com- puter Vision, 2014. 2

work page 2014
[7]

Recognition of Food Images Based on Transfer Learning and Ensemble Learning

Le Bu, Caiping Hu, and Xiuliang Zhang. Recognition of Food Images Based on Transfer Learning and Ensemble Learning. Plos One, 19(1):e0296789, 2024. 1, 3

work page 2024
[8]

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

Jingjing Chen and Chong-Wah Ngo. Deep-based Ingredient Recognition for Cooking Recipe Retrieval. In Proc. of the International Conference on Multimedia, 2016. 2

work page 2016
[9]

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, and Chong Ruan. Janus- pro: Unified Multimodal Understanding and Generation with Data and Model Scaling. arXiv preprint arXiv:2501.17811,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Food Recognition: A New Dataset, Experiments, and Results

Gianluigi Ciocca, Paolo Napoletano, and Raimondo Schet- tini. Food Recognition: A New Dataset, Experiments, and Results. IEEE Journal of Biomedical and Health Informat- ics, 21(3):588–598, 2016. 2

work page 2016
[11]

How Good is ChatGPT at Face Biometrics? a First Look into Recognition, Soft Biometrics, and Explain- ability

Ivan Deandres-Tame, Ruben Tolosana, Ruben Vera- Rodriguez, Aythami Morales, Julian Fierrez, and Javier Ortega-Garcia. How Good is ChatGPT at Face Biometrics? a First Look into Recognition, Soft Biometrics, and Explain- ability. IEEE Access, 12:34390–34401, 2024. 1

work page 2024
[12]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Petko Georgiev, Ving Ian Lei, Ryan Burnell, et al. Gemini 1.5: Unlocking Multimodal Understanding across Millions of Tokens of Context. arXiv preprint arXiv:2403.05530 ,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Health–environment Efficiency of Diets Shows Nonlinear Trends over 1990–2011

Pan He, Zhu Liu, Giovanni Baiocchi, Dabo Guan, Yan Bai, and Klaus Hubacek. Health–environment Efficiency of Diets Shows Nonlinear Trends over 1990–2011. Nature Food, 5 (2):116–124, 2024. 1

work page 1990
[14]

Squeeze-and-Excitation Net- works

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-Excitation Net- works. In Proc. of the Conference on Computer Vision and Pattern Recognition, 2018. 3

work page 2018
[15]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. GPT-4o System Card. arXiv preprint arXiv:2410.21276, 2024. 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Kawano and K

Y . Kawano and K. Yanai. Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation. In Proc. of the Workshop on Transferring and Adapting Source Knowledge in Computer Vision, 2014. 2

work page 2014
[17]

Multimodal Food Image Classification with Large Language Models

Jun-Hwa Kim, Nam-Ho Kim, Donghyeok Jo, and Chee Sun Won. Multimodal Food Image Classification with Large Language Models. Electronics, 13(22), 2024. 3

work page 2024
[18]

BLIP-2: Bootstrapping Language-image Pre-training with Frozen Image Encoders and Large Language Models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping Language-image Pre-training with Frozen Image Encoders and Large Language Models. In Proc. of the International Conference on Machine Learning,

work page
[19]

VILA: On Pre-training for Vi- sual Language Models

Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Moham- mad Shoeybi, and Song Han. VILA: On Pre-training for Vi- sual Language Models. In Proc. of the Conference on Com- puter Vision and Pattern Recognition, 2024. 2

work page 2024
[20]

Perspec- tive: Data in Personalized Nutrition: Bridging Biomedi- cal, Psycho-behavioral, and Food Environment Approaches for Population-wide Impact

Jakob Linseisen, Britta Renner, Kurt Gedrich, Jan Wirsam, Christina Holzapfel, Stefan Lorkowski, Bernhard Watzl, Hannelore Daniel, Michael Leitzmann, et al. Perspec- tive: Data in Personalized Nutrition: Bridging Biomedi- cal, Psycho-behavioral, and Food Environment Approaches for Population-wide Impact. Advances in Nutrition , page 100377, 2025. 1

work page 2025
[21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 Technical Report. arXiv preprint arXiv:2412.19437, 2024. 1

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved Baselines with Visual Instruction Tuning. InProc. of the Conference on Computer Vision and Pattern Recogni- tion, 2024. 4

work page 2024
[23]

Research on Food Image Recognition of Deep Learning Algorithms

Lihua Luo. Research on Food Image Recognition of Deep Learning Algorithms. In Proc. of the International Confer- ence on Computers, Information Processing and Advanced Education, 2023. 1, 3

work page 2023
[24]

Ahuja, and Cheng-I Wei

Peihua Ma, Shawn Tsai, Yiyang He, Xiaoxue Jia, Dongyang Zhen, Ning Yu, Qin Wang, Jaspreet K.C. Ahuja, and Cheng-I Wei. Large Language Models in Food Science: Innovations, Applications, and Future. Trends in Food Science & Tech- nology, 148:104488, 2024. 1

work page 2024
[25]

Integrating Vision-Language Models for Accelerated High- Throughput Nutrition Screening

Peihua Ma, Yixin Wu, Ning Yu, Xiaoxue Jia, Yiyang He, Yang Zhang, Michael Backes, Qin Wang, and Cheng-I Wei. Integrating Vision-Language Models for Accelerated High- Throughput Nutrition Screening. Advanced Science, 11(34): 2403578, 2024. 3

work page 2024
[26]

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evalu- ating Vision-Language Models

Zheng Ma, Mianzhi Pan, Wenhan Wu, Kanzhi Cheng, Jian- bing Zhang, Shujian Huang, and Jiajun Chen. Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evalu- ating Vision-Language Models. In Proc. of the International Conference on Multimedia, 2023. 3

work page 2023
[27]

Matsuda, H

Y . Matsuda, H. Hoashi, and K. Yanai. Recognition of Multiple-Food Images by Detecting Candidate Regions. In Proc. of the International Conference on Multimedia and Expo, 2012. 2

work page 2012
[28]

Patrick McAllister, Huiru Zheng, Raymond Bond, and Anne Moorhead. Combining Deep Residual Neural Network Fea- tures with Supervised Machine Learning Algorithms to Clas- sify Diverse Food Image Datasets.Computers in Biology and Medicine, 95:217–233, 2018. 3

work page 2018
[29]

A Survey on Food Computing

Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. A Survey on Food Computing. ACM Com- puting Surveys, 52(5):1–36, 2019. 1

work page 2019
[30]

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In Proc. of the In- ternational Conference on Multimedia, 2020. 2, 3

work page 2020
[31]

Large Scale Visual Food Recognition

Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. Large Scale Visual Food Recognition. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(8): 9932–9949, 2023. 3

work page 2023
[32]

Llava-chef: A Multi- modal Generative Model for Food Recipes

Fnu Mohbat and Mohammed J Zaki. Llava-chef: A Multi- modal Generative Model for Food Recipes. In Proc. of the International Conference on Information and Knowledge Management, 2024. 3

work page 2024
[33]

An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition

Kintoh Allen Nfor, Tagne Poupi Theodore Armand, Kenes- baeva Periyzat Ismaylovna, Moon-Il Joo, and Hee-Cheol Kim. An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition. Nutrients, 17 (2):362, 2025. 3

work page 2025
[34]

Using LLMs to Extract Food Entities from Cooking Recipes

Vasiliki Pitsilou, George Papadakis, and Dimitrios Skoutas. Using LLMs to Extract Food Entities from Cooking Recipes. In Proc. of the International Conference on Data Engineer- ing Workshops, 2024. 1

work page 2024
[35]

FoodGPT: A Large Language Model in Food Test- ing Domain with Incremental Pre-training and Knowledge Graph Prompt

Zhixiao Qi, Yijiong Yu, Meiqi Tu, Junyi Tan, and Yongfeng Huang. FoodGPT: A Large Language Model in Food Test- ing Domain with Incremental Pre-training and Knowledge Graph Prompt. arXiv preprint arXiv:2308.10173, 2023. 1

work page arXiv 2023
[36]

Learning Transferable Visual Models from Natural Language Super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models from Natural Language Super- vision. In Proc. of the International Conference on Machine Learning, 2021. 3

work page 2021
[37]

Dining on Details: LLM-Guided Expert Net- works for Fine-Grained Food Recognition

Jes ´us M Rodr´ıguez-de Vera, Pablo Villacorta, Imanol G Es- tepa, Marc Bola˜nos, Ignacio Saras´ua, Bhalaji Nagarajan, and Petia Radeva. Dining on Details: LLM-Guided Expert Net- works for Fine-Grained Food Recognition. InProc. of the In- ternational Workshop on Multimedia Assisted Dietary Man- agement, 2023. 1

work page 2023
[38]

LOFI: LOng- tailed FIne-Grained Network for Food Recognition

Jes ´us M Rodr ´ıguez-De-Vera, Imanol G Estepa, Marc Bola˜nos, Bhalaji Nagarajan, and Petia Radeva. LOFI: LOng- tailed FIne-Grained Network for Food Recognition. In Proc. of the Conference on Computer Vision and Pattern Recogni- tion, 2024. 3

work page 2024
[39]

AI4FoodDB: A Database for Per- sonalized e-Health Nutrition and Lifestyle through Wear- able Devices and Artificial Intelligence

Sergio Romero-Tapiador, Blanca Lacruz-Pleguezuelos, Ruben Tolosana, et al. AI4FoodDB: A Database for Per- sonalized e-Health Nutrition and Lifestyle through Wear- able Devices and Artificial Intelligence. Database, 2023: baad049, 2023. 2, 3

work page 2023
[40]

AI4Food-NutritionFW: A Novel Frame- work for the Automatic Synthesis and Analysis of Eating Behaviours

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, et al. AI4Food-NutritionFW: A Novel Frame- work for the Automatic Synthesis and Analysis of Eating Behaviours. IEEE Access, 1:112199 – 112211, 2023. 1

work page 2023
[41]

Leveraging Automatic Personalised Nu- trition: Food Image Recognition Benchmark and Dataset Based on Nutrition Taxonomy

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, et al. Leveraging Automatic Personalised Nu- trition: Food Image Recognition Benchmark and Dataset Based on Nutrition Taxonomy. Multimedia Tools and Applications, 84:1945–1966, 2024. 1, 3

work page 1945
[42]

Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, et al. Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence. arXiv preprint arXiv:2409.08700, 2024. 8

work page internal anchor Pith review Pith/arXiv arXiv 2024
[43]

Losing Visual Needles in Image Haystacks: Vision Lan- guage Models are Easily Distracted in Short and Long Con- texts

Aditya Sharma, Michael Saxon, and William Yang Wang. Losing Visual Needles in Image Haystacks: Vision Lan- guage Models are Easily Distracted in Short and Long Con- texts. arXiv preprint arXiv:2406.16851, 2024. 2

work page arXiv 2024
[44]

A Lightweight Hybrid Model with Location-preserving ViT for Efficient Food Recognition

Guorui Sheng, Weiqing Min, Xiangyi Zhu, Liang Xu, Qing- shuo Sun, Yancun Yang, Lili Wang, and Shuqiang Jiang. A Lightweight Hybrid Model with Location-preserving ViT for Efficient Food Recognition. Nutrients, 16(2):200, 2024. 1, 3

work page 2024
[45]

Why and How the Indo-Mediterranean Diet May Be Superior to Other Diets: The Role of Antioxidants in the Diet

Ram B Singh, Jan Fedacko, Ghizal Fatima, Aminat Magomedova, Shaw Watanabe, and Galal Elkilany. Why and How the Indo-Mediterranean Diet May Be Superior to Other Diets: The Role of Antioxidants in the Diet. Nutrients, 14 (4):898, 2022. 1

work page 2022
[46]

Food/Non-Food Image Classification and Food Categoriza- tion Using Pre-Trained GoogLeNet Model

Ashutosh Singla, Lin Yuan, and Touradj Ebrahimi. Food/Non-Food Image Classification and Food Categoriza- tion Using Pre-Trained GoogLeNet Model. In Proc. of the International Workshop on Multimedia Assisted Dietary Management, 2016. 2

work page 2016
[47]

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, and Jack Sim. Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food. In Proc. of the Conference on Computer Vision and Pattern Recognition, 2021. 2

work page 2021
[48]

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, et al. Janus: Decoupling Visual Encod- ing for Unified Multimodal Understanding and Generation. arXiv preprint arXiv:2410.13848, 2024. 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[49]

ChatDiet: Empowering Personalized Nutrition- oriented Food Recommender Chatbots through an LLM- Augmented Framework

Zhongqi Yang, Elahe Khatibi, Nitish Nagesh, Mahyar Ab- basian, Iman Azimi, Ramesh Jain, and Amir M Rah- mani. ChatDiet: Empowering Personalized Nutrition- oriented Food Recommender Chatbots through an LLM- Augmented Framework. Smart Health , 32:100465, 2024. 1

work page 2024
[50]

FoodLMM: A Versatile Food Assistant Using Large Multi-modal Model

Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, and Chong-Wah Ngo. FoodLMM: A Versatile Food Assistant Using Large Multi-modal Model. arXiv preprint arXiv:2312.14991, 2023. 3

work page arXiv 2023
[51]

LLM-based Hierarchical Label Anno- tation for Foodborne Illness Detection on Social Media

Dongyu Zhang, Ruofan Hu, Dandan Tao, Hao Feng, and Elke Rundensteiner. LLM-based Hierarchical Label Anno- tation for Foodborne Illness Detection on Social Media. In Proc. of the International Conference on Big Data, 2024. 1

work page 2024
[52]

Influence of Foods and Nutrition on the Gut Microbiome and Implications for Intestinal Health

Ping Zhang. Influence of Foods and Nutrition on the Gut Microbiome and Implications for Intestinal Health. Interna- tional Journal of Molecular Sciences, 23(17):9588, 2022. 1

work page 2022

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Food Recognition using Fusion of Classifiers Based on CNNs

Eduardo Aguilar, Marc Bola ˜nos, and Petia Radeva. Food Recognition using Fusion of Classifiers Based on CNNs. In Proc. of the International Conference on Image Analysis and Processing, 2017. 2

work page 2017

[3] [3]

Gemini: A Family of Highly Capable Multimodal Models

Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, et al. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805, 2023. 1, 4

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

The Claude 3 Model Family: Opus, Sonnet, Haiku

AI Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Claude-3 Model Card, 1:1, 2024. 4

work page 2024

[5] [5]

Li, Adrien Bardes, Suzanne Petryk, Oscar Ma ˜nas, et al

Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C Li, Adrien Bardes, Suzanne Petryk, Oscar Ma˜nas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, et al. An Introduction to Vision-Language Modeling. arXiv preprint arXiv:2405.17247, 2024. 2

work page arXiv 2024

[6] [6]

Food-101 – Mining Discriminative Components with Ran- dom Forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101 – Mining Discriminative Components with Ran- dom Forests. In Proc. of the European Conference on Com- puter Vision, 2014. 2

work page 2014

[7] [7]

Recognition of Food Images Based on Transfer Learning and Ensemble Learning

Le Bu, Caiping Hu, and Xiuliang Zhang. Recognition of Food Images Based on Transfer Learning and Ensemble Learning. Plos One, 19(1):e0296789, 2024. 1, 3

work page 2024

[8] [8]

Deep-based Ingredient Recognition for Cooking Recipe Retrieval

Jingjing Chen and Chong-Wah Ngo. Deep-based Ingredient Recognition for Cooking Recipe Retrieval. In Proc. of the International Conference on Multimedia, 2016. 2

work page 2016

[9] [9]

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, and Chong Ruan. Janus- pro: Unified Multimodal Understanding and Generation with Data and Model Scaling. arXiv preprint arXiv:2501.17811,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Food Recognition: A New Dataset, Experiments, and Results

Gianluigi Ciocca, Paolo Napoletano, and Raimondo Schet- tini. Food Recognition: A New Dataset, Experiments, and Results. IEEE Journal of Biomedical and Health Informat- ics, 21(3):588–598, 2016. 2

work page 2016

[11] [11]

How Good is ChatGPT at Face Biometrics? a First Look into Recognition, Soft Biometrics, and Explain- ability

Ivan Deandres-Tame, Ruben Tolosana, Ruben Vera- Rodriguez, Aythami Morales, Julian Fierrez, and Javier Ortega-Garcia. How Good is ChatGPT at Face Biometrics? a First Look into Recognition, Soft Biometrics, and Explain- ability. IEEE Access, 12:34390–34401, 2024. 1

work page 2024

[12] [12]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Petko Georgiev, Ving Ian Lei, Ryan Burnell, et al. Gemini 1.5: Unlocking Multimodal Understanding across Millions of Tokens of Context. arXiv preprint arXiv:2403.05530 ,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Health–environment Efficiency of Diets Shows Nonlinear Trends over 1990–2011

Pan He, Zhu Liu, Giovanni Baiocchi, Dabo Guan, Yan Bai, and Klaus Hubacek. Health–environment Efficiency of Diets Shows Nonlinear Trends over 1990–2011. Nature Food, 5 (2):116–124, 2024. 1

work page 1990

[14] [14]

Squeeze-and-Excitation Net- works

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-Excitation Net- works. In Proc. of the Conference on Computer Vision and Pattern Recognition, 2018. 3

work page 2018

[15] [15]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. GPT-4o System Card. arXiv preprint arXiv:2410.21276, 2024. 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

Kawano and K

Y . Kawano and K. Yanai. Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation. In Proc. of the Workshop on Transferring and Adapting Source Knowledge in Computer Vision, 2014. 2

work page 2014

[17] [17]

Multimodal Food Image Classification with Large Language Models

Jun-Hwa Kim, Nam-Ho Kim, Donghyeok Jo, and Chee Sun Won. Multimodal Food Image Classification with Large Language Models. Electronics, 13(22), 2024. 3

work page 2024

[18] [18]

BLIP-2: Bootstrapping Language-image Pre-training with Frozen Image Encoders and Large Language Models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping Language-image Pre-training with Frozen Image Encoders and Large Language Models. In Proc. of the International Conference on Machine Learning,

work page

[19] [19]

VILA: On Pre-training for Vi- sual Language Models

Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Moham- mad Shoeybi, and Song Han. VILA: On Pre-training for Vi- sual Language Models. In Proc. of the Conference on Com- puter Vision and Pattern Recognition, 2024. 2

work page 2024

[20] [20]

Perspec- tive: Data in Personalized Nutrition: Bridging Biomedi- cal, Psycho-behavioral, and Food Environment Approaches for Population-wide Impact

Jakob Linseisen, Britta Renner, Kurt Gedrich, Jan Wirsam, Christina Holzapfel, Stefan Lorkowski, Bernhard Watzl, Hannelore Daniel, Michael Leitzmann, et al. Perspec- tive: Data in Personalized Nutrition: Bridging Biomedi- cal, Psycho-behavioral, and Food Environment Approaches for Population-wide Impact. Advances in Nutrition , page 100377, 2025. 1

work page 2025

[21] [21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 Technical Report. arXiv preprint arXiv:2412.19437, 2024. 1

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Improved Baselines with Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved Baselines with Visual Instruction Tuning. InProc. of the Conference on Computer Vision and Pattern Recogni- tion, 2024. 4

work page 2024

[23] [23]

Research on Food Image Recognition of Deep Learning Algorithms

Lihua Luo. Research on Food Image Recognition of Deep Learning Algorithms. In Proc. of the International Confer- ence on Computers, Information Processing and Advanced Education, 2023. 1, 3

work page 2023

[24] [24]

Ahuja, and Cheng-I Wei

Peihua Ma, Shawn Tsai, Yiyang He, Xiaoxue Jia, Dongyang Zhen, Ning Yu, Qin Wang, Jaspreet K.C. Ahuja, and Cheng-I Wei. Large Language Models in Food Science: Innovations, Applications, and Future. Trends in Food Science & Tech- nology, 148:104488, 2024. 1

work page 2024

[25] [25]

Integrating Vision-Language Models for Accelerated High- Throughput Nutrition Screening

Peihua Ma, Yixin Wu, Ning Yu, Xiaoxue Jia, Yiyang He, Yang Zhang, Michael Backes, Qin Wang, and Cheng-I Wei. Integrating Vision-Language Models for Accelerated High- Throughput Nutrition Screening. Advanced Science, 11(34): 2403578, 2024. 3

work page 2024

[26] [26]

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evalu- ating Vision-Language Models

Zheng Ma, Mianzhi Pan, Wenhan Wu, Kanzhi Cheng, Jian- bing Zhang, Shujian Huang, and Jiajun Chen. Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evalu- ating Vision-Language Models. In Proc. of the International Conference on Multimedia, 2023. 3

work page 2023

[27] [27]

Matsuda, H

Y . Matsuda, H. Hoashi, and K. Yanai. Recognition of Multiple-Food Images by Detecting Candidate Regions. In Proc. of the International Conference on Multimedia and Expo, 2012. 2

work page 2012

[28] [28]

Patrick McAllister, Huiru Zheng, Raymond Bond, and Anne Moorhead. Combining Deep Residual Neural Network Fea- tures with Supervised Machine Learning Algorithms to Clas- sify Diverse Food Image Datasets.Computers in Biology and Medicine, 95:217–233, 2018. 3

work page 2018

[29] [29]

A Survey on Food Computing

Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. A Survey on Food Computing. ACM Com- puting Surveys, 52(5):1–36, 2019. 1

work page 2019

[30] [30]

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. In Proc. of the In- ternational Conference on Multimedia, 2020. 2, 3

work page 2020

[31] [31]

Large Scale Visual Food Recognition

Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Liping Kang, Xiaoming Wei, Xiaolin Wei, and Shuqiang Jiang. Large Scale Visual Food Recognition. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(8): 9932–9949, 2023. 3

work page 2023

[32] [32]

Llava-chef: A Multi- modal Generative Model for Food Recipes

Fnu Mohbat and Mohammed J Zaki. Llava-chef: A Multi- modal Generative Model for Food Recipes. In Proc. of the International Conference on Information and Knowledge Management, 2024. 3

work page 2024

[33] [33]

An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition

Kintoh Allen Nfor, Tagne Poupi Theodore Armand, Kenes- baeva Periyzat Ismaylovna, Moon-Il Joo, and Hee-Cheol Kim. An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition. Nutrients, 17 (2):362, 2025. 3

work page 2025

[34] [34]

Using LLMs to Extract Food Entities from Cooking Recipes

Vasiliki Pitsilou, George Papadakis, and Dimitrios Skoutas. Using LLMs to Extract Food Entities from Cooking Recipes. In Proc. of the International Conference on Data Engineer- ing Workshops, 2024. 1

work page 2024

[35] [35]

FoodGPT: A Large Language Model in Food Test- ing Domain with Incremental Pre-training and Knowledge Graph Prompt

Zhixiao Qi, Yijiong Yu, Meiqi Tu, Junyi Tan, and Yongfeng Huang. FoodGPT: A Large Language Model in Food Test- ing Domain with Incremental Pre-training and Knowledge Graph Prompt. arXiv preprint arXiv:2308.10173, 2023. 1

work page arXiv 2023

[36] [36]

Learning Transferable Visual Models from Natural Language Super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning Transferable Visual Models from Natural Language Super- vision. In Proc. of the International Conference on Machine Learning, 2021. 3

work page 2021

[37] [37]

Dining on Details: LLM-Guided Expert Net- works for Fine-Grained Food Recognition

Jes ´us M Rodr´ıguez-de Vera, Pablo Villacorta, Imanol G Es- tepa, Marc Bola˜nos, Ignacio Saras´ua, Bhalaji Nagarajan, and Petia Radeva. Dining on Details: LLM-Guided Expert Net- works for Fine-Grained Food Recognition. InProc. of the In- ternational Workshop on Multimedia Assisted Dietary Man- agement, 2023. 1

work page 2023

[38] [38]

LOFI: LOng- tailed FIne-Grained Network for Food Recognition

Jes ´us M Rodr ´ıguez-De-Vera, Imanol G Estepa, Marc Bola˜nos, Bhalaji Nagarajan, and Petia Radeva. LOFI: LOng- tailed FIne-Grained Network for Food Recognition. In Proc. of the Conference on Computer Vision and Pattern Recogni- tion, 2024. 3

work page 2024

[39] [39]

AI4FoodDB: A Database for Per- sonalized e-Health Nutrition and Lifestyle through Wear- able Devices and Artificial Intelligence

Sergio Romero-Tapiador, Blanca Lacruz-Pleguezuelos, Ruben Tolosana, et al. AI4FoodDB: A Database for Per- sonalized e-Health Nutrition and Lifestyle through Wear- able Devices and Artificial Intelligence. Database, 2023: baad049, 2023. 2, 3

work page 2023

[40] [40]

AI4Food-NutritionFW: A Novel Frame- work for the Automatic Synthesis and Analysis of Eating Behaviours

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, et al. AI4Food-NutritionFW: A Novel Frame- work for the Automatic Synthesis and Analysis of Eating Behaviours. IEEE Access, 1:112199 – 112211, 2023. 1

work page 2023

[41] [41]

Leveraging Automatic Personalised Nu- trition: Food Image Recognition Benchmark and Dataset Based on Nutrition Taxonomy

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, et al. Leveraging Automatic Personalised Nu- trition: Food Image Recognition Benchmark and Dataset Based on Nutrition Taxonomy. Multimedia Tools and Applications, 84:1945–1966, 2024. 1, 3

work page 1945

[42] [42]

Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, et al. Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence. arXiv preprint arXiv:2409.08700, 2024. 8

work page internal anchor Pith review Pith/arXiv arXiv 2024

[43] [43]

Losing Visual Needles in Image Haystacks: Vision Lan- guage Models are Easily Distracted in Short and Long Con- texts

Aditya Sharma, Michael Saxon, and William Yang Wang. Losing Visual Needles in Image Haystacks: Vision Lan- guage Models are Easily Distracted in Short and Long Con- texts. arXiv preprint arXiv:2406.16851, 2024. 2

work page arXiv 2024

[44] [44]

A Lightweight Hybrid Model with Location-preserving ViT for Efficient Food Recognition

Guorui Sheng, Weiqing Min, Xiangyi Zhu, Liang Xu, Qing- shuo Sun, Yancun Yang, Lili Wang, and Shuqiang Jiang. A Lightweight Hybrid Model with Location-preserving ViT for Efficient Food Recognition. Nutrients, 16(2):200, 2024. 1, 3

work page 2024

[45] [45]

Why and How the Indo-Mediterranean Diet May Be Superior to Other Diets: The Role of Antioxidants in the Diet

Ram B Singh, Jan Fedacko, Ghizal Fatima, Aminat Magomedova, Shaw Watanabe, and Galal Elkilany. Why and How the Indo-Mediterranean Diet May Be Superior to Other Diets: The Role of Antioxidants in the Diet. Nutrients, 14 (4):898, 2022. 1

work page 2022

[46] [46]

Food/Non-Food Image Classification and Food Categoriza- tion Using Pre-Trained GoogLeNet Model

Ashutosh Singla, Lin Yuan, and Touradj Ebrahimi. Food/Non-Food Image Classification and Food Categoriza- tion Using Pre-Trained GoogLeNet Model. In Proc. of the International Workshop on Multimedia Assisted Dietary Management, 2016. 2

work page 2016

[47] [47]

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Quin Thames, Arjun Karpur, Wade Norris, Fangting Xia, Liviu Panait, Tobias Weyand, and Jack Sim. Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food. In Proc. of the Conference on Computer Vision and Pattern Recognition, 2021. 2

work page 2021

[48] [48]

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, et al. Janus: Decoupling Visual Encod- ing for Unified Multimodal Understanding and Generation. arXiv preprint arXiv:2410.13848, 2024. 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[49] [49]

ChatDiet: Empowering Personalized Nutrition- oriented Food Recommender Chatbots through an LLM- Augmented Framework

Zhongqi Yang, Elahe Khatibi, Nitish Nagesh, Mahyar Ab- basian, Iman Azimi, Ramesh Jain, and Amir M Rah- mani. ChatDiet: Empowering Personalized Nutrition- oriented Food Recommender Chatbots through an LLM- Augmented Framework. Smart Health , 32:100465, 2024. 1

work page 2024

[50] [50]

FoodLMM: A Versatile Food Assistant Using Large Multi-modal Model

Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, and Chong-Wah Ngo. FoodLMM: A Versatile Food Assistant Using Large Multi-modal Model. arXiv preprint arXiv:2312.14991, 2023. 3

work page arXiv 2023

[51] [51]

LLM-based Hierarchical Label Anno- tation for Foodborne Illness Detection on Social Media

Dongyu Zhang, Ruofan Hu, Dandan Tao, Hao Feng, and Elke Rundensteiner. LLM-based Hierarchical Label Anno- tation for Foodborne Illness Detection on Social Media. In Proc. of the International Conference on Big Data, 2024. 1

work page 2024

[52] [52]

Influence of Foods and Nutrition on the Gut Microbiome and Implications for Intestinal Health

Ping Zhang. Influence of Foods and Nutrition on the Gut Microbiome and Implications for Intestinal Health. Interna- tional Journal of Molecular Sciences, 23(17):9588, 2022. 1

work page 2022