OmniFood8K: Single-Image Nutrition Estimation via Hierarchical Frequency-Aligned Fusion

Dongjian Yu; Qian Jiang; Shuqiang Jiang; Weiqing Min; Xing Lin; Xin Jin

arxiv: 2604.12356 · v1 · submitted 2026-04-14 · 💻 cs.CV

OmniFood8K: Single-Image Nutrition Estimation via Hierarchical Frequency-Aligned Fusion

Dongjian Yu , Weiqing Min , Qian Jiang , Xing Lin , Xin Jin , Shuqiang Jiang This is my paper

Pith reviewed 2026-05-10 14:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords food nutrition estimationsingle-image predictiondepth estimationfrequency domain fusionChinese food datasetsynthetic data augmentationmultimodal feature fusion

0 comments

The pith

Predicting depth from a single RGB image and fusing it with RGB features in the frequency domain enables more accurate food nutrition estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This work creates OmniFood8K, a dataset with 8036 food samples focused on Chinese dishes that includes nutritional annotations and multi-view images. It also builds a large synthetic dataset called NutritionSynth-115K to add compositional variety while keeping exact nutrition labels. The proposed method starts by estimating a depth map from one RGB photo, refines that depth for consistency, then uses hierarchical frequency alignment to combine depth and color features before predicting nutrition values with a mask that highlights ingredients. The goal is to make nutrition tracking possible from ordinary photos without depth cameras or limited to Western foods. Experiments across datasets show better results than prior techniques.

Core claim

By predicting a depth map from a single RGB image and refining it with a Scale-Shift Residual Adapter for scale and structure, then hierarchically aligning and fusing the RGB and depth features in the frequency domain through the Frequency-Aligned Fusion Module, and finally applying a Mask-based Prediction Head to focus on key regions, the method achieves improved nutritional predictions that surpass existing approaches on multiple datasets including the new OmniFood8K.

What carries the argument

The Frequency-Aligned Fusion Module (FAFM) that performs hierarchical alignment and fusion of RGB and depth features in the frequency domain to capture better compositional details for nutrition.

If this is right

Nutrition estimation becomes feasible using only standard camera photos in daily settings.
Coverage expands to Chinese and other non-Western cuisines through the dedicated dataset.
Synthetic data with preserved labels helps train models on varied food compositions.
Frequency domain processing of multimodal features improves accuracy for ingredient-based predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a system could power mobile apps that scan meals for instant dietary feedback.
The fusion technique might transfer to estimating other properties like freshness or allergens from photos.
Further gains could come from integrating this with real depth data when available or improving the initial depth prediction.
Large-scale synthetic data generation may reduce reliance on expensive manual annotations for similar vision tasks.

Load-bearing premise

The synthetic dataset must preserve precise nutritional labels while adding realistic variations, and the frequency fusion with predicted depth must deliver accuracy gains over standard RGB processing.

What would settle it

A test where the full model is compared to a version without the frequency fusion module on the OmniFood8K validation set, and no reduction in prediction error for nutrients like calories or protein is observed.

Figures

Figures reproduced from arXiv: 2604.12356 by Dongjian Yu, Qian Jiang, Shuqiang Jiang, Weiqing Min, Xing Lin, Xin Jin.

**Figure 2.** Figure 2: Overview of the OmniFood8K dataset: data collection process and category distribution. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed method. The figure illustrates the overall pipeline of our method, consisting of three proposed modules: [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Accurate estimation of food nutrition plays a vital role in promoting healthy dietary habits and personalized diet management. Most existing food datasets primarily focus on Western cuisines and lack sufficient coverage of Chinese dishes, which restricts accurate nutritional estimation for Chinese meals. Moreover, many state-of-the-art nutrition prediction methods rely on depth sensors, restricting their applicability in daily scenarios. To address these limitations, we introduce OmniFood8K, a comprehensive multimodal dataset comprising 8,036 food samples, each with detailed nutritional annotations and multi-view images. In addition, to enhance models' capability in nutritional prediction, we construct NutritionSynth-115K, a large-scale synthetic dataset that introduces compositional variations while preserving precise nutritional labels. Moreover, we propose an end-to-end framework for nutritional prediction from a single RGB image. First, we predict a depth map from a single RGB image and design the Scale-Shift Residual Adapter (SSRA) to refine it for global scale consistency and local structural preservation. Second, we propose the Frequency-Aligned Fusion Module (FAFM) to hierarchically align and fuse RGB and depth features in the frequency domain. Finally, we design a Mask-based Prediction Head (MPH) to emphasize key ingredient regions via dynamic channel selection for more accurate prediction. Extensive experiments on multiple datasets demonstrate the superiority of our method over existing approaches. Project homepage: https://yudongjian.github.io/OmniFood8K-food/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New Chinese-food dataset and single-RGB nutrition pipeline using depth prediction plus frequency fusion.

read the letter

Hey, the main takeaway is a new dataset of Chinese dishes with nutrition labels and a single-image method that predicts depth then fuses it with RGB features in the frequency domain to estimate calories and macros better than prior RGB-only baselines. They also built a large synthetic set to add realistic variations during training. The work fills a clear gap since most food datasets skew Western and many strong methods still need depth cameras that people don't carry around. The three modules line up logically: the scale-shift adapter cleans up the predicted depth map for consistency, the hierarchical frequency alignment module combines the two modalities without simple channel stacking, and the mask head directs attention to ingredient regions. That architecture is coherent for the stated goal of practical daily use. The dataset itself looks useful on paper, with 8k real multi-view samples plus 115k synthetic ones that keep precise labels. Soft spots are mostly about verification. The abstract asserts superiority but gives no error numbers, confidence intervals, or ablation tables, so the size of the gains and whether each module actually moves the needle remain to be checked in the full results. It would also help to see the exact procedure for generating the synthetic compositions and any checks that they don't introduce label drift or unrealistic mixes. Minor issues like that, nothing load-bearing from what is described. This is aimed at the food-computing and mobile-health crowd rather than general CV. Someone building nutrition apps or working on multimodal regression would get concrete value from the data and the fusion details. The thinking is straightforward and the motivation holds up, so it deserves a serious referee rather than a desk reject.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces OmniFood8K, a multimodal dataset of 8,036 food samples with nutritional annotations and multi-view images focused on Chinese cuisines, along with the synthetic NutritionSynth-115K dataset for compositional augmentation. It proposes an end-to-end single-RGB nutrition estimation framework that first predicts and refines a depth map via the Scale-Shift Residual Adapter (SSRA), hierarchically aligns and fuses RGB-depth features in the frequency domain using the Frequency-Aligned Fusion Module (FAFM), and applies a Mask-based Prediction Head (MPH) for ingredient-region emphasis. The central claim is that this pipeline outperforms prior methods on multiple datasets.

Significance. If the quantitative results and ablations hold, the work is significant for enabling practical single-image nutrition estimation without depth sensors and for filling a gap in non-Western food datasets. The combination of synthetic data generation, frequency-domain fusion, and mask-based prediction offers a coherent technical approach that could influence mobile health and dietary applications.

major comments (2)

[§4.1, Table 1] §4.1 and Table 1: The superiority claim over RGB-only baselines rests on the reported gains from SSRA+FAFM+MPH, but the ablation study does not isolate the contribution of frequency alignment versus simple concatenation; without this breakdown the load-bearing role of FAFM remains unclear.
[§3.2] §3.2: The assertion that NutritionSynth-115K preserves precise nutritional labels while adding realistic variations is central to training validity, yet the data-generation procedure (ingredient sampling, rendering parameters) is described at a high level without pseudocode or validation metrics against real distributions.

minor comments (3)

[Figure 3] Figure 3: The frequency-domain visualization would benefit from explicit axis labels and a side-by-side comparison with spatial-domain fusion to clarify the alignment benefit.
[§2] §2: Several citations to prior food datasets (e.g., Food-101, Nutrition5K) are present but lack discussion of their Western bias statistics, which would strengthen the motivation for OmniFood8K.
[Eq. (7)] Notation: The definition of the hierarchical frequency alignment loss in Eq. (7) uses an undefined weighting hyperparameter λ; clarify its value and sensitivity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment point by point below and will revise the manuscript to incorporate the requested clarifications, which we believe will strengthen the presentation of our contributions.

read point-by-point responses

Referee: [§4.1, Table 1] §4.1 and Table 1: The superiority claim over RGB-only baselines rests on the reported gains from SSRA+FAFM+MPH, but the ablation study does not isolate the contribution of frequency alignment versus simple concatenation; without this breakdown the load-bearing role of FAFM remains unclear.

Authors: We agree that the existing ablation table shows the combined effect of the full pipeline but does not isolate the benefit of frequency-domain alignment in FAFM against a direct concatenation baseline. To address this, we will add a new ablation row in the revised Table 1 (and corresponding discussion in §4.1) that replaces FAFM with hierarchical concatenation of RGB and depth features while keeping SSRA and MPH fixed. This will provide a direct comparison and clarify the specific contribution of the frequency alignment mechanism. revision: yes
Referee: [§3.2] §3.2: The assertion that NutritionSynth-115K preserves precise nutritional labels while adding realistic variations is central to training validity, yet the data-generation procedure (ingredient sampling, rendering parameters) is described at a high level without pseudocode or validation metrics against real distributions.

Authors: We acknowledge that Section 3.2 currently provides only a high-level overview of the synthetic data pipeline. In the revised manuscript we will expand this section with (i) pseudocode for the ingredient sampling and rendering procedure and (ii) quantitative validation metrics, including distributional comparisons (e.g., KL divergence on nutritional vectors and visual feature statistics) between NutritionSynth-115K and the real OmniFood8K samples. These additions will substantiate the claim that precise labels are preserved while realistic variations are introduced. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces new datasets (OmniFood8K and NutritionSynth-115K) and an end-to-end architecture (SSRA + FAFM + MPH) for single-image nutrition estimation. No equations, derivations, or parameter-fitting steps appear in the provided text that reduce a claimed prediction or result to an input defined by the same data or self-citation. The central claims rest on experimental superiority across multiple datasets, which constitutes external validation rather than an internal self-referential loop. The method description is technically coherent and does not invoke uniqueness theorems, ansatzes smuggled via prior self-work, or renaming of known results as new derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions (CNN feature extractors, frequency-domain operations being beneficial) and the unverified premise that synthetic data generation preserves exact nutrition labels; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5565 in / 1116 out tokens · 42023 ms · 2026-05-10T14:51:35.366340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 1 internal anchor

[1]

Explainable Artificial Intelligence Techniques for Interpretation of Food Models: a Review

Leonardo Arrighi, Ingrid Alves de Moraes, Marco Zul- lich, Michele Simonato, Douglas Fernandes Barbin, and Sylvio Barbon Junior. Explainable artificial intelligence techniques for interpretation of food datasets: a review.arXiv preprint arXiv:2504.10527, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Menu-match: Restaurant-specific food logging from images

Oscar Beijbom, Neel Joshi, Dan Morris, Scott Saponas, and Siddharth Khullar. Menu-match: Restaurant-specific food logging from images. In2015 IEEE Winter Conference on Applications of Computer Vision, pages 844–851, 2015. 4

work page 2015
[3]

Cross-modal hierarchical interaction network for rgb-d salient object detection.Pattern Recogni- tion, 136:109194, 2023

Hongbo Bi, Ranwan Wu, Ziqi Liu, Huihui Zhu, Cong Zhang, and Tian-Zhu Xiang. Cross-modal hierarchical interaction network for rgb-d salient object detection.Pattern Recogni- tion, 136:109194, 2023. 7

work page 2023
[4]

2d prediction of the nutritional compo- sition of dishes from food images: Deep learning algorithm selection and data curation beyond the nutrition5k project

Rachele Bianco et al. 2d prediction of the nutritional compo- sition of dishes from food images: Deep learning algorithm selection and data curation beyond the nutrition5k project. Nutrients, 17(13):2196, 2025. 2

work page 2025
[5]

Food-101–mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. InEuropean Conference on Computer Vision, pages 446–461. Springer, 2014. 2, 4

work page 2014
[6]

Deep-based ingredi- ent recognition for cooking recipe retrieval

Jingjing Chen and Chong-Wah Ngo. Deep-based ingredi- ent recognition for cooking recipe retrieval. InProceedings of the 24th ACM International Conference on Multimedia, pages 32–41, 2016. 2, 4

work page 2016
[7]

Metafood3d: Large 3d food object dataset with nutrition values.arXiv e-prints, pages arXiv–2409,

Yuhao Chen et al. Metafood3d: Large 3d food object dataset with nutrition values.arXiv e-prints, pages arXiv–2409,

work page
[8]

Food recognition and calorie estimation using machine learning

Siddhartha Chinthala, Prem Kumar Erla, Akshaya Dongari, Ajay Bantu, Sai Ganesh Chityala, and M Saravanan. Food recognition and calorie estimation using machine learning. International Journal of Engineering & Extended Technolo- gies Research, 8(2):480–488, 2026. 2

work page 2026
[9]

Advancements in using ai for dietary assessment based on food images: scop- ing review.Journal of Medical Internet Research, 26: e51432, 2024

Phawinpon Chotwanvirat, Aree Prachansuwan, Pimnapanut Sridonpai, and Wantanee Kriengsinyos. Advancements in using ai for dietary assessment based on food images: scop- ing review.Journal of Medical Internet Research, 26: e51432, 2024. 2

work page 2024
[10]

arXiv preprint arXiv:2602.24240 (2026)

Chengyan Deng, Zhangquan Chen, Li Yu, Kai Zhang, Xue Zhou, and Wang Zhang. Joint geometric and trajectory con- sistency learning for one-step real-world super-resolution. arXiv preprint arXiv:2602.24240, 2026. 1

work page arXiv 2026
[11]

Ihmambasr: An importance-guided hierarchi- cal mamba with dynamic prompt for single image super- resolution.Pattern Recognition, page 113057, 2026

Chengyan Deng, Kai Zhang, Lieqiang Yang, Wang Zhang, and Yu Li. Ihmambasr: An importance-guided hierarchi- cal mamba with dynamic prompt for single image super- resolution.Pattern Recognition, page 113057, 2026. 1

work page 2026
[12]

A step forward in food science, technology and industry using artificial intelligence.Trends in Food Science & Technology, 143:104286, 2024

Rezvan Esmaeily, Mohammad Amin Razavi, and Seyed Hadi Razavi. A step forward in food science, technology and industry using artificial intelligence.Trends in Food Science & Technology, 143:104286, 2024. 2

work page 2024
[13]

Single-view food portion estimation based on geometric models

Shaobo Fang, Chang Liu, Fengqing Zhu, Edward J Delp, and Carol J Boushey. Single-view food portion estimation based on geometric models. In2015 IEEE International Sympo- sium on Multimedia (ISM), pages 385–390, 2015. 4

work page 2015
[14]

Ingredient-guided rgb-d fusion network for nutritional assessment.IEEE Transactions on AgriFood Electronics, 2024

Zhihui Feng et al. Ingredient-guided rgb-d fusion network for nutritional assessment.IEEE Transactions on AgriFood Electronics, 2024. 1, 3, 8

work page 2024
[15]

Navigating weight prediction with diet diary

Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang. Navigating weight prediction with diet diary. InProceedings of the 32nd ACM International Conference on Multimedia, pages 127–136, 2024. 2

work page 2024
[16]

Dpf-nutrition: Food nutrition estimation via depth prediction and fusion.Foods, 12(23), 2023

Yuzhe Han, Qimin Cheng, Wenjin Wu, and Ziyang Huang. Dpf-nutrition: Food nutrition estimation via depth prediction and fusion.Foods, 12(23), 2023. 1, 2, 7, 8

work page 2023
[17]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 7

work page 2016
[18]

Searching for mo- bilenetv3

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mo- bilenetv3. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019. 7

work page 2019
[19]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil- ian Q. Weinberger. Densely connected convolutional net- works. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 7

work page 2017
[20]

Psychometric testing of the teacher food and nutrition-related health and wellbe- ing questionnaire.BMC Public Health, 2026

Tammie Jakstas, Andrew Miller, Vanessa A Shrewsbury, Tamara Bucher, and Clare E Collins. Psychometric testing of the teacher food and nutrition-related health and wellbe- ing questionnaire.BMC Public Health, 2026. 1

work page 2026
[21]

Rode: Linear rectified mixture of diverse experts for food large multi-modal models.arXiv preprint arXiv:2407.12730, 2024

Pengkun Jiao, Xinlan Wu, Bin Zhu, Jingjing Chen, Chong- Wah Ngo, and Yugang Jiang. Rode: Linear rectified mixture of diverse experts for food large multi-modal models.arXiv preprint arXiv:2407.12730, 2024. 2, 3, 4, 7

work page arXiv 2024
[22]

A review of image-based food recognition and vol- ume estimation artificial intelligence systems.IEEE Reviews in Biomedical Engineering, 17:136–152, 2023

Fotios S Konstantakopoulos, Eleni I Georga, and Dimitrios I Fotiadis. A review of image-based food recognition and vol- ume estimation artificial intelligence systems.IEEE Reviews in Biomedical Engineering, 17:136–152, 2023. 1

work page 2023
[23]

Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network

Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. InProceedings of the 29th ACM international conference on multimedia, pages 4481–4490, 2021. 7

work page 2021
[24]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 11976–11986,

work page
[25]

Swin transformer: Hierarchical vision trans- former using shifted windows

Ze Liu et al. Swin transformer: Hierarchical vision trans- former using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. 7

work page 2021
[26]

Food nutri- tion estimation with rgb-d fusion module and bidirectional feature pyramid network.Multimedia Systems, 31(2):1–11,

Boyuan Ma, Donglin Zhang, and Xiao-Jun Wu. Food nutri- tion estimation with rgb-d fusion module and bidirectional feature pyramid network.Multimedia Systems, 31(2):1–11,

work page
[27]

You are what you eat: Ex- ploring rich recipe information for cross-region food anal- ysis.IEEE Transactions on Multimedia, 20(4):950–964,

Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. You are what you eat: Ex- ploring rich recipe information for cross-region food anal- ysis.IEEE Transactions on Multimedia, 20(4):950–964,

work page
[28]

A survey on food computing.ACM Computing Surveys, 52(5):1–36, 2019

Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. A survey on food computing.ACM Computing Surveys, 52(5):1–36, 2019. 1

work page 2019
[29]

Ingredient-guided cascaded multi-attention network for food recognition

Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. Ingredient-guided cascaded multi-attention network for food recognition. InProceedings of the 27th ACM In- ternational Conference on Multimedia, pages 1331–1339,

work page
[30]

Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network

Weiqing Min et al. Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network. InProceedings of the 28th ACM International Conference on Multimedia, pages 393–401, 2020. 2, 4

work page 2020
[31]

Large scale visual food recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9932–9949, 2023

Weiqing Min et al. Large scale visual food recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9932–9949, 2023. 2, 4

work page 2023
[32]

Food and nutrition in the maha strategy—promise and peril.JAMA, 335(2):119–121, 2026

Dariush Mozaffarian, Emily A Callahan, and William H Frist. Food and nutrition in the maha strategy—promise and peril.JAMA, 335(2):119–121, 2026. 1

work page 2026
[33]

Ingredient-guided multi-modal in- teraction and refinement network for rgb-d food nutrition as- sessment.Digital Signal Processing, 153:104664, 2024

Fudong Nian, Yujie Hu, Yanhong Gu, Zhize Wu, Shimeng Yang, and Jianhua Shu. Ingredient-guided multi-modal in- teraction and refinement network for rgb-d food nutrition as- sessment.Digital Signal Processing, 153:104664, 2024. 7

work page 2024
[34]

A framework for food recognition and pre- dicting its nutritional value through convolution neural net- work

Deepak NR et al. A framework for food recognition and pre- dicting its nutritional value through convolution neural net- work. InProceedings of the International Conference on Innovative Computing & Communication, page 6, 2022. 3

work page 2022
[35]

Dietary intake assess- ment using a novel, generic meal–based recall and a 24-hour recall: Comparison study.Journal of Medical Internet Re- search, 26:e48817, 2024

Cathal O’Hara and Eileen R Gibney. Dietary intake assess- ment using a novel, generic meal–based recall and a 24-hour recall: Comparison study.Journal of Medical Internet Re- search, 26:e48817, 2024. 1

work page 2024
[36]

Fmifood: Multi-modal contrastive learning for food image classifica- tion

Xinyue Pan, Jiangpeng He, and Fengqing Zhu. Fmifood: Multi-modal contrastive learning for food image classifica- tion. In2024 IEEE 26th International Workshop on Multi- media Signal Processing (MMSP), pages 1–6, 2024. 2

work page 2024
[37]

Advancing food nutrition estimation via visual-ingredient feature fusion

Huiyan Qi, Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Ee-Peng Lim. Advancing food nutrition estimation via visual-ingredient feature fusion. InProceedings of the 2025 International Conference on Multimedia Retrieval, pages 1091–1099, 2025. 1, 2, 4

work page 2025
[38]

Machine learning-driven precision nutrition: A paradigm evolution in dietary assessment and intervention

Wenbin Quan, Jingbo Zhou, Juan Wang, Jihong Huang, and Liping Du. Machine learning-driven precision nutrition: A paradigm evolution in dietary assessment and intervention. Nutrients, 18(1):45, 2025. 1

work page 2025
[39]

Concerns around ev- idence that food processing should be included in dietary guidance.Nature Medicine, pages 1–3, 2026

Eric Robinson and Ciar ´an G Forde. Concerns around ev- idence that food processing should be included in dietary guidance.Nature Medicine, pages 1–3, 2026. 2

work page 2026
[40]

Are vision-language mod- els ready for dietary assessment? exploring the next frontier in ai-powered food image recognition

Sergio Romero-Tapiador et al. Are vision-language mod- els ready for dietary assessment? exploring the next frontier in ai-powered food image recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 430–439, 2025. 2

work page 2025
[41]

Learning cross-modal embeddings for cooking recipes and food images

Amaia Salvador et al. Learning cross-modal embeddings for cooking recipes and food images. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 3020–3028, 2017. 2, 4

work page 2017
[42]

Rapid non-destructive analysis of food nutrient content using swin-nutrition.Foods, 11(21), 2022

Wenjing Shao, Sujuan Hou, Weikuan Jia, and Yuanjie Zheng. Rapid non-destructive analysis of food nutrient content using swin-nutrition.Foods, 11(21), 2022. 1, 2, 3, 7

work page 2022
[43]

Vision- based food nutrition estimation via rgb-d fusion network

Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. Vision- based food nutrition estimation via rgb-d fusion network. Food Chemistry, 424:136309, 2023. 7, 8

work page 2023
[44]

An end-to-end food portion estimation framework based on shape reconstruction from monocular image

Zeman Shao, Gautham Vinod, Jiangpeng He, and Fengqing Zhu. An end-to-end food portion estimation framework based on shape reconstruction from monocular image. In 2023 IEEE ICME, pages 942–947, 2023. 1, 3, 7

work page 2023
[45]

Machine learning based approach on food recognition and nutrition estimation.Procedia Computer Science, 174: 448–453, 2020

Zhidong Shen, Adnan Shehzad, Si Chen, Hui Sun, and Jin Liu. Machine learning based approach on food recognition and nutrition estimation.Procedia Computer Science, 174: 448–453, 2020. 1

work page 2020
[46]

Rice nitrogen nutri- tion estimation with rgb images and machine learning meth- ods.Computers and Electronics in Agriculture, 180:105860,

Peihua Shi, Yuan Wang, Jianmin Xu, Yanling Zhao, Baolin Yang, Zhengqi Yuan, and Qingyun Sun. Rice nitrogen nutri- tion estimation with rgb images and machine learning meth- ods.Computers and Electronics in Agriculture, 180:105860,

work page
[47]

Ai-based digital image dietary assessment methods com- pared to humans and ground truth: a systematic review.An- nals of Medicine, 55(2):2273497, 2023

Eleanor Shonkoff, Kelly Copeland Cara, Xuechen Pei, Mei Chung, Shreyas Kamath, Karen Panetta, and Erin Hennessy. Ai-based digital image dietary assessment methods com- pared to humans and ground truth: a systematic review.An- nals of Medicine, 55(2):2273497, 2023. 1

work page 2023
[48]

Minimum days estimation for reliable dietary intake infor- mation: findings from a digital cohort.European Journal of Clinical Nutrition, pages 1–11, 2025

Rohan Singh, Mathieu Th ´eo Eric Verest, and Marcel Salath´e. Minimum days estimation for reliable dietary intake infor- mation: findings from a digital cohort.European Journal of Clinical Nutrition, pages 1–11, 2025. 1

work page 2025
[49]

Mark H. Stone. The cubit: A history and measurement com- mentary.Journal of Anthropology, 2014(1):489757, 2014. 3

work page 2014
[50]

Rethinking the inception archi- tecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception archi- tecture for computer vision. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016. 7

work page 2016
[51]

Efficientnet: Rethinking model scaling for convolutional neural networks

Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational Conference on Machine Learning, pages 6105–6114. PMLR,

work page
[52]

Reasoning-driven food en- ergy estimation via multimodal large language models.Nu- trients, 17(7):1128, 2025

Hikaru Tanabe and Keiji Yanai. Reasoning-driven food en- ergy estimation via multimodal large language models.Nu- trients, 17(7):1128, 2025. 2

work page 2025
[53]

Nutrition5k: Towards automatic nu- tritional understanding of generic food

Quin Thames et al. Nutrition5k: Towards automatic nu- tritional understanding of generic food. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8903–8911, 2021. 2, 4, 7

work page 2021
[54]

Global food security and sustainability issues: the road to 2030 from nutrition and sustainable healthy diets to food systems change.Foods, 13 (2):306, 2024

Theodoros Varzakas and Slim Smaoui. Global food security and sustainability issues: the road to 2030 from nutrition and sustainable healthy diets to food systems change.Foods, 13 (2):306, 2024. 2

work page 2030
[55]

Image based food energy estimation with depth domain adaptation

Gautham Vinod, Zeman Shao, and Fengqing Zhu. Image based food energy estimation with depth domain adaptation. In2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval, pages 262–267, 2022. 3, 7

work page 2022
[56]

Coarse-to-fine nutrition prediction

Binglu Wang, Tianci Bu, Zaiyi Hu, Le Yang, Yongqiang Zhao, and Xuelong Li. Coarse-to-fine nutrition prediction. IEEE Transactions on Multimedia, 26:3651–3662, 2023. 2, 7

work page 2023
[57]

Smart fibers and textiles for personal health manage- ment.ACS nano, 15(8):12497–12508, 2021

Huimin Wang, Yong Zhang, Xiaoping Liang, and Yingying Zhang. Smart fibers and textiles for personal health manage- ment.ACS nano, 15(8):12497–12508, 2021. 1

work page 2021
[58]

A review on vision-based analysis for automatic dietary assessment.Trends in Food Science & Technology, 122:223–237, 2022

Wei Wang, Weiqing Min, Tianhao Li, Xiaoxiao Dong, Haisheng Li, and Shuqiang Jiang. A review on vision-based analysis for automatic dietary assessment.Trends in Food Science & Technology, 122:223–237, 2022. 1

work page 2022
[59]

Clare Whitton et al. Accuracy of energy and nu- trient intake estimation versus observed intake using 4 technology-assisted dietary assessment methods: a random- ized crossover feeding study.The American journal of clini- cal nutrition, 120(1):196–210, 2024. 1

work page 2024
[60]

Convnext v2: Co-designing and scal- ing convnets with masked autoencoders

Sanghyun Woo et al. Convnext v2: Co-designing and scal- ing convnets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023. 7

work page 2023
[61]

A large-scale benchmark for food im- age segmentation

Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven CH Hoi, and Qianru Sun. A large-scale benchmark for food im- age segmentation. InProceedings of the 29th ACM Inter- national Conference on Multimedia, pages 506–515, 2021. 2

work page 2021
[62]

Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 2, 5

work page 2024
[63]

Spatial-aware multi-modal information fu- sion for food nutrition estimation

Dongjian Yu, Weiqing Min, Xin Jin, Qian Jiang, and Shuqiang Jiang. Spatial-aware multi-modal information fu- sion for food nutrition estimation. InProceedings of the 33rd ACM International Conference on Multimedia, page 8863–8871, 2025. 6

work page 2025
[64]

Cross-modality discrepant interaction net- work for rgb-d salient object detection

Chen Zhang et al. Cross-modality discrepant interaction net- work for rgb-d salient object detection. InProceedings of the 29th ACM International Conference on Multimedia, pages 2094–2102, 2021. 7

work page 2094
[65]

Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 24(12): 14679–14694, 2023

Jiaming Zhang, Huayao Liu, Kailun Yang, Xinxin Hu, Ruip- ing Liu, and Rainer Stiefelhagen. Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 24(12): 14679–14694, 2023. 7

work page 2023
[66]

Delivering arbitrary-modal semantic segmentation

Jiaming Zhang et al. Delivering arbitrary-modal semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1136– 1147, 2023. 7

work page 2023
[67]

Recent de- sign strategies and applications of small molecule fluorescent probes for food detection.Coordination Chemistry Reviews, 522:216232, 2025

Peng Zhang, Jiali Su, Hui Zhen, Tong Yu, Liangchen Wei, Mingyue Zheng, Chaoyuan Zeng, and Wei Shu. Recent de- sign strategies and applications of small molecule fluorescent probes for food detection.Coordination Chemistry Reviews, 522:216232, 2025. 2

work page 2025
[68]

Deep learning in food category recognition.Information Fusion, 98:101859, 2023

Yudong Zhang, Lijia Deng, Hengde Zhu, Wei Wang, Zeyu Ren, Qinghua Zhou, Siyuan Lu, Shiting Sun, Ziquan Zhu, Juan Manuel Gorriz, et al. Deep learning in food category recognition.Information Fusion, 98:101859, 2023. 2

work page 2023
[69]

Artificial intelligence applications to measure food and nu- trient intakes: scoping review.Journal of medical Internet research, 26:e54557, 2024

Jiakun Zheng, Junjie Wang, Jing Shen, and Ruopeng An. Artificial intelligence applications to measure food and nu- trient intakes: scoping review.Journal of medical Internet research, 26:e54557, 2024. 1

work page 2024
[70]

Towards automatic learning of procedures from web instructional videos

Luowei Zhou, Chenliang Xu, and Jason Corso. Towards automatic learning of procedures from web instructional videos. InProceedings of the AAAI Conference on Artificial Intelligence, 2018. 4

work page 2018
[71]

Defnet: Dual-branch enhanced feature fusion network for rgb-t crowd counting.IEEE Transactions on Intelligent Transportation Systems, 23(12):24540–24549, 2022

Wujie Zhou, Yi Pan, Jingsheng Lei, Lv Ye, and Lu Yu. Defnet: Dual-branch enhanced feature fusion network for rgb-t crowd counting.IEEE Transactions on Intelligent Transportation Systems, 23(12):24540–24549, 2022. 7

work page 2022

[1] [1]

Explainable Artificial Intelligence Techniques for Interpretation of Food Models: a Review

Leonardo Arrighi, Ingrid Alves de Moraes, Marco Zul- lich, Michele Simonato, Douglas Fernandes Barbin, and Sylvio Barbon Junior. Explainable artificial intelligence techniques for interpretation of food datasets: a review.arXiv preprint arXiv:2504.10527, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Menu-match: Restaurant-specific food logging from images

Oscar Beijbom, Neel Joshi, Dan Morris, Scott Saponas, and Siddharth Khullar. Menu-match: Restaurant-specific food logging from images. In2015 IEEE Winter Conference on Applications of Computer Vision, pages 844–851, 2015. 4

work page 2015

[3] [3]

Cross-modal hierarchical interaction network for rgb-d salient object detection.Pattern Recogni- tion, 136:109194, 2023

Hongbo Bi, Ranwan Wu, Ziqi Liu, Huihui Zhu, Cong Zhang, and Tian-Zhu Xiang. Cross-modal hierarchical interaction network for rgb-d salient object detection.Pattern Recogni- tion, 136:109194, 2023. 7

work page 2023

[4] [4]

2d prediction of the nutritional compo- sition of dishes from food images: Deep learning algorithm selection and data curation beyond the nutrition5k project

Rachele Bianco et al. 2d prediction of the nutritional compo- sition of dishes from food images: Deep learning algorithm selection and data curation beyond the nutrition5k project. Nutrients, 17(13):2196, 2025. 2

work page 2025

[5] [5]

Food-101–mining discriminative components with random forests

Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. InEuropean Conference on Computer Vision, pages 446–461. Springer, 2014. 2, 4

work page 2014

[6] [6]

Deep-based ingredi- ent recognition for cooking recipe retrieval

Jingjing Chen and Chong-Wah Ngo. Deep-based ingredi- ent recognition for cooking recipe retrieval. InProceedings of the 24th ACM International Conference on Multimedia, pages 32–41, 2016. 2, 4

work page 2016

[7] [7]

Metafood3d: Large 3d food object dataset with nutrition values.arXiv e-prints, pages arXiv–2409,

Yuhao Chen et al. Metafood3d: Large 3d food object dataset with nutrition values.arXiv e-prints, pages arXiv–2409,

work page

[8] [8]

Food recognition and calorie estimation using machine learning

Siddhartha Chinthala, Prem Kumar Erla, Akshaya Dongari, Ajay Bantu, Sai Ganesh Chityala, and M Saravanan. Food recognition and calorie estimation using machine learning. International Journal of Engineering & Extended Technolo- gies Research, 8(2):480–488, 2026. 2

work page 2026

[9] [9]

Advancements in using ai for dietary assessment based on food images: scop- ing review.Journal of Medical Internet Research, 26: e51432, 2024

Phawinpon Chotwanvirat, Aree Prachansuwan, Pimnapanut Sridonpai, and Wantanee Kriengsinyos. Advancements in using ai for dietary assessment based on food images: scop- ing review.Journal of Medical Internet Research, 26: e51432, 2024. 2

work page 2024

[10] [10]

arXiv preprint arXiv:2602.24240 (2026)

Chengyan Deng, Zhangquan Chen, Li Yu, Kai Zhang, Xue Zhou, and Wang Zhang. Joint geometric and trajectory con- sistency learning for one-step real-world super-resolution. arXiv preprint arXiv:2602.24240, 2026. 1

work page arXiv 2026

[11] [11]

Ihmambasr: An importance-guided hierarchi- cal mamba with dynamic prompt for single image super- resolution.Pattern Recognition, page 113057, 2026

Chengyan Deng, Kai Zhang, Lieqiang Yang, Wang Zhang, and Yu Li. Ihmambasr: An importance-guided hierarchi- cal mamba with dynamic prompt for single image super- resolution.Pattern Recognition, page 113057, 2026. 1

work page 2026

[12] [12]

A step forward in food science, technology and industry using artificial intelligence.Trends in Food Science & Technology, 143:104286, 2024

Rezvan Esmaeily, Mohammad Amin Razavi, and Seyed Hadi Razavi. A step forward in food science, technology and industry using artificial intelligence.Trends in Food Science & Technology, 143:104286, 2024. 2

work page 2024

[13] [13]

Single-view food portion estimation based on geometric models

Shaobo Fang, Chang Liu, Fengqing Zhu, Edward J Delp, and Carol J Boushey. Single-view food portion estimation based on geometric models. In2015 IEEE International Sympo- sium on Multimedia (ISM), pages 385–390, 2015. 4

work page 2015

[14] [14]

Ingredient-guided rgb-d fusion network for nutritional assessment.IEEE Transactions on AgriFood Electronics, 2024

Zhihui Feng et al. Ingredient-guided rgb-d fusion network for nutritional assessment.IEEE Transactions on AgriFood Electronics, 2024. 1, 3, 8

work page 2024

[15] [15]

Navigating weight prediction with diet diary

Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang. Navigating weight prediction with diet diary. InProceedings of the 32nd ACM International Conference on Multimedia, pages 127–136, 2024. 2

work page 2024

[16] [16]

Dpf-nutrition: Food nutrition estimation via depth prediction and fusion.Foods, 12(23), 2023

Yuzhe Han, Qimin Cheng, Wenjin Wu, and Ziyang Huang. Dpf-nutrition: Food nutrition estimation via depth prediction and fusion.Foods, 12(23), 2023. 1, 2, 7, 8

work page 2023

[17] [17]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 7

work page 2016

[18] [18]

Searching for mo- bilenetv3

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mo- bilenetv3. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019. 7

work page 2019

[19] [19]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil- ian Q. Weinberger. Densely connected convolutional net- works. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 7

work page 2017

[20] [20]

Psychometric testing of the teacher food and nutrition-related health and wellbe- ing questionnaire.BMC Public Health, 2026

Tammie Jakstas, Andrew Miller, Vanessa A Shrewsbury, Tamara Bucher, and Clare E Collins. Psychometric testing of the teacher food and nutrition-related health and wellbe- ing questionnaire.BMC Public Health, 2026. 1

work page 2026

[21] [21]

Rode: Linear rectified mixture of diverse experts for food large multi-modal models.arXiv preprint arXiv:2407.12730, 2024

Pengkun Jiao, Xinlan Wu, Bin Zhu, Jingjing Chen, Chong- Wah Ngo, and Yugang Jiang. Rode: Linear rectified mixture of diverse experts for food large multi-modal models.arXiv preprint arXiv:2407.12730, 2024. 2, 3, 4, 7

work page arXiv 2024

[22] [22]

A review of image-based food recognition and vol- ume estimation artificial intelligence systems.IEEE Reviews in Biomedical Engineering, 17:136–152, 2023

Fotios S Konstantakopoulos, Eleni I Georga, and Dimitrios I Fotiadis. A review of image-based food recognition and vol- ume estimation artificial intelligence systems.IEEE Reviews in Biomedical Engineering, 17:136–152, 2023. 1

work page 2023

[23] [23]

Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network

Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. InProceedings of the 29th ACM international conference on multimedia, pages 4481–4490, 2021. 7

work page 2021

[24] [24]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 11976–11986,

work page

[25] [25]

Swin transformer: Hierarchical vision trans- former using shifted windows

Ze Liu et al. Swin transformer: Hierarchical vision trans- former using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. 7

work page 2021

[26] [26]

Food nutri- tion estimation with rgb-d fusion module and bidirectional feature pyramid network.Multimedia Systems, 31(2):1–11,

Boyuan Ma, Donglin Zhang, and Xiao-Jun Wu. Food nutri- tion estimation with rgb-d fusion module and bidirectional feature pyramid network.Multimedia Systems, 31(2):1–11,

work page

[27] [27]

You are what you eat: Ex- ploring rich recipe information for cross-region food anal- ysis.IEEE Transactions on Multimedia, 20(4):950–964,

Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. You are what you eat: Ex- ploring rich recipe information for cross-region food anal- ysis.IEEE Transactions on Multimedia, 20(4):950–964,

work page

[28] [28]

A survey on food computing.ACM Computing Surveys, 52(5):1–36, 2019

Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. A survey on food computing.ACM Computing Surveys, 52(5):1–36, 2019. 1

work page 2019

[29] [29]

Ingredient-guided cascaded multi-attention network for food recognition

Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. Ingredient-guided cascaded multi-attention network for food recognition. InProceedings of the 27th ACM In- ternational Conference on Multimedia, pages 1331–1339,

work page

[30] [30]

Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network

Weiqing Min et al. Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network. InProceedings of the 28th ACM International Conference on Multimedia, pages 393–401, 2020. 2, 4

work page 2020

[31] [31]

Large scale visual food recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9932–9949, 2023

Weiqing Min et al. Large scale visual food recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9932–9949, 2023. 2, 4

work page 2023

[32] [32]

Food and nutrition in the maha strategy—promise and peril.JAMA, 335(2):119–121, 2026

Dariush Mozaffarian, Emily A Callahan, and William H Frist. Food and nutrition in the maha strategy—promise and peril.JAMA, 335(2):119–121, 2026. 1

work page 2026

[33] [33]

Ingredient-guided multi-modal in- teraction and refinement network for rgb-d food nutrition as- sessment.Digital Signal Processing, 153:104664, 2024

Fudong Nian, Yujie Hu, Yanhong Gu, Zhize Wu, Shimeng Yang, and Jianhua Shu. Ingredient-guided multi-modal in- teraction and refinement network for rgb-d food nutrition as- sessment.Digital Signal Processing, 153:104664, 2024. 7

work page 2024

[34] [34]

A framework for food recognition and pre- dicting its nutritional value through convolution neural net- work

Deepak NR et al. A framework for food recognition and pre- dicting its nutritional value through convolution neural net- work. InProceedings of the International Conference on Innovative Computing & Communication, page 6, 2022. 3

work page 2022

[35] [35]

Dietary intake assess- ment using a novel, generic meal–based recall and a 24-hour recall: Comparison study.Journal of Medical Internet Re- search, 26:e48817, 2024

Cathal O’Hara and Eileen R Gibney. Dietary intake assess- ment using a novel, generic meal–based recall and a 24-hour recall: Comparison study.Journal of Medical Internet Re- search, 26:e48817, 2024. 1

work page 2024

[36] [36]

Fmifood: Multi-modal contrastive learning for food image classifica- tion

Xinyue Pan, Jiangpeng He, and Fengqing Zhu. Fmifood: Multi-modal contrastive learning for food image classifica- tion. In2024 IEEE 26th International Workshop on Multi- media Signal Processing (MMSP), pages 1–6, 2024. 2

work page 2024

[37] [37]

Advancing food nutrition estimation via visual-ingredient feature fusion

Huiyan Qi, Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Ee-Peng Lim. Advancing food nutrition estimation via visual-ingredient feature fusion. InProceedings of the 2025 International Conference on Multimedia Retrieval, pages 1091–1099, 2025. 1, 2, 4

work page 2025

[38] [38]

Machine learning-driven precision nutrition: A paradigm evolution in dietary assessment and intervention

Wenbin Quan, Jingbo Zhou, Juan Wang, Jihong Huang, and Liping Du. Machine learning-driven precision nutrition: A paradigm evolution in dietary assessment and intervention. Nutrients, 18(1):45, 2025. 1

work page 2025

[39] [39]

Concerns around ev- idence that food processing should be included in dietary guidance.Nature Medicine, pages 1–3, 2026

Eric Robinson and Ciar ´an G Forde. Concerns around ev- idence that food processing should be included in dietary guidance.Nature Medicine, pages 1–3, 2026. 2

work page 2026

[40] [40]

Are vision-language mod- els ready for dietary assessment? exploring the next frontier in ai-powered food image recognition

Sergio Romero-Tapiador et al. Are vision-language mod- els ready for dietary assessment? exploring the next frontier in ai-powered food image recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 430–439, 2025. 2

work page 2025

[41] [41]

Learning cross-modal embeddings for cooking recipes and food images

Amaia Salvador et al. Learning cross-modal embeddings for cooking recipes and food images. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 3020–3028, 2017. 2, 4

work page 2017

[42] [42]

Rapid non-destructive analysis of food nutrient content using swin-nutrition.Foods, 11(21), 2022

Wenjing Shao, Sujuan Hou, Weikuan Jia, and Yuanjie Zheng. Rapid non-destructive analysis of food nutrient content using swin-nutrition.Foods, 11(21), 2022. 1, 2, 3, 7

work page 2022

[43] [43]

Vision- based food nutrition estimation via rgb-d fusion network

Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. Vision- based food nutrition estimation via rgb-d fusion network. Food Chemistry, 424:136309, 2023. 7, 8

work page 2023

[44] [44]

An end-to-end food portion estimation framework based on shape reconstruction from monocular image

Zeman Shao, Gautham Vinod, Jiangpeng He, and Fengqing Zhu. An end-to-end food portion estimation framework based on shape reconstruction from monocular image. In 2023 IEEE ICME, pages 942–947, 2023. 1, 3, 7

work page 2023

[45] [45]

Machine learning based approach on food recognition and nutrition estimation.Procedia Computer Science, 174: 448–453, 2020

Zhidong Shen, Adnan Shehzad, Si Chen, Hui Sun, and Jin Liu. Machine learning based approach on food recognition and nutrition estimation.Procedia Computer Science, 174: 448–453, 2020. 1

work page 2020

[46] [46]

Rice nitrogen nutri- tion estimation with rgb images and machine learning meth- ods.Computers and Electronics in Agriculture, 180:105860,

Peihua Shi, Yuan Wang, Jianmin Xu, Yanling Zhao, Baolin Yang, Zhengqi Yuan, and Qingyun Sun. Rice nitrogen nutri- tion estimation with rgb images and machine learning meth- ods.Computers and Electronics in Agriculture, 180:105860,

work page

[47] [47]

Ai-based digital image dietary assessment methods com- pared to humans and ground truth: a systematic review.An- nals of Medicine, 55(2):2273497, 2023

Eleanor Shonkoff, Kelly Copeland Cara, Xuechen Pei, Mei Chung, Shreyas Kamath, Karen Panetta, and Erin Hennessy. Ai-based digital image dietary assessment methods com- pared to humans and ground truth: a systematic review.An- nals of Medicine, 55(2):2273497, 2023. 1

work page 2023

[48] [48]

Minimum days estimation for reliable dietary intake infor- mation: findings from a digital cohort.European Journal of Clinical Nutrition, pages 1–11, 2025

Rohan Singh, Mathieu Th ´eo Eric Verest, and Marcel Salath´e. Minimum days estimation for reliable dietary intake infor- mation: findings from a digital cohort.European Journal of Clinical Nutrition, pages 1–11, 2025. 1

work page 2025

[49] [49]

Mark H. Stone. The cubit: A history and measurement com- mentary.Journal of Anthropology, 2014(1):489757, 2014. 3

work page 2014

[50] [50]

Rethinking the inception archi- tecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception archi- tecture for computer vision. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016. 7

work page 2016

[51] [51]

Efficientnet: Rethinking model scaling for convolutional neural networks

Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational Conference on Machine Learning, pages 6105–6114. PMLR,

work page

[52] [52]

Reasoning-driven food en- ergy estimation via multimodal large language models.Nu- trients, 17(7):1128, 2025

Hikaru Tanabe and Keiji Yanai. Reasoning-driven food en- ergy estimation via multimodal large language models.Nu- trients, 17(7):1128, 2025. 2

work page 2025

[53] [53]

Nutrition5k: Towards automatic nu- tritional understanding of generic food

Quin Thames et al. Nutrition5k: Towards automatic nu- tritional understanding of generic food. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8903–8911, 2021. 2, 4, 7

work page 2021

[54] [54]

Global food security and sustainability issues: the road to 2030 from nutrition and sustainable healthy diets to food systems change.Foods, 13 (2):306, 2024

Theodoros Varzakas and Slim Smaoui. Global food security and sustainability issues: the road to 2030 from nutrition and sustainable healthy diets to food systems change.Foods, 13 (2):306, 2024. 2

work page 2030

[55] [55]

Image based food energy estimation with depth domain adaptation

Gautham Vinod, Zeman Shao, and Fengqing Zhu. Image based food energy estimation with depth domain adaptation. In2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval, pages 262–267, 2022. 3, 7

work page 2022

[56] [56]

Coarse-to-fine nutrition prediction

Binglu Wang, Tianci Bu, Zaiyi Hu, Le Yang, Yongqiang Zhao, and Xuelong Li. Coarse-to-fine nutrition prediction. IEEE Transactions on Multimedia, 26:3651–3662, 2023. 2, 7

work page 2023

[57] [57]

Smart fibers and textiles for personal health manage- ment.ACS nano, 15(8):12497–12508, 2021

Huimin Wang, Yong Zhang, Xiaoping Liang, and Yingying Zhang. Smart fibers and textiles for personal health manage- ment.ACS nano, 15(8):12497–12508, 2021. 1

work page 2021

[58] [58]

A review on vision-based analysis for automatic dietary assessment.Trends in Food Science & Technology, 122:223–237, 2022

Wei Wang, Weiqing Min, Tianhao Li, Xiaoxiao Dong, Haisheng Li, and Shuqiang Jiang. A review on vision-based analysis for automatic dietary assessment.Trends in Food Science & Technology, 122:223–237, 2022. 1

work page 2022

[59] [59]

Clare Whitton et al. Accuracy of energy and nu- trient intake estimation versus observed intake using 4 technology-assisted dietary assessment methods: a random- ized crossover feeding study.The American journal of clini- cal nutrition, 120(1):196–210, 2024. 1

work page 2024

[60] [60]

Convnext v2: Co-designing and scal- ing convnets with masked autoencoders

Sanghyun Woo et al. Convnext v2: Co-designing and scal- ing convnets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023. 7

work page 2023

[61] [61]

A large-scale benchmark for food im- age segmentation

Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven CH Hoi, and Qianru Sun. A large-scale benchmark for food im- age segmentation. InProceedings of the 29th ACM Inter- national Conference on Multimedia, pages 506–515, 2021. 2

work page 2021

[62] [62]

Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 2, 5

work page 2024

[63] [63]

Spatial-aware multi-modal information fu- sion for food nutrition estimation

Dongjian Yu, Weiqing Min, Xin Jin, Qian Jiang, and Shuqiang Jiang. Spatial-aware multi-modal information fu- sion for food nutrition estimation. InProceedings of the 33rd ACM International Conference on Multimedia, page 8863–8871, 2025. 6

work page 2025

[64] [64]

Cross-modality discrepant interaction net- work for rgb-d salient object detection

Chen Zhang et al. Cross-modality discrepant interaction net- work for rgb-d salient object detection. InProceedings of the 29th ACM International Conference on Multimedia, pages 2094–2102, 2021. 7

work page 2094

[65] [65]

Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 24(12): 14679–14694, 2023

Jiaming Zhang, Huayao Liu, Kailun Yang, Xinxin Hu, Ruip- ing Liu, and Rainer Stiefelhagen. Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 24(12): 14679–14694, 2023. 7

work page 2023

[66] [66]

Delivering arbitrary-modal semantic segmentation

Jiaming Zhang et al. Delivering arbitrary-modal semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1136– 1147, 2023. 7

work page 2023

[67] [67]

Recent de- sign strategies and applications of small molecule fluorescent probes for food detection.Coordination Chemistry Reviews, 522:216232, 2025

Peng Zhang, Jiali Su, Hui Zhen, Tong Yu, Liangchen Wei, Mingyue Zheng, Chaoyuan Zeng, and Wei Shu. Recent de- sign strategies and applications of small molecule fluorescent probes for food detection.Coordination Chemistry Reviews, 522:216232, 2025. 2

work page 2025

[68] [68]

Deep learning in food category recognition.Information Fusion, 98:101859, 2023

Yudong Zhang, Lijia Deng, Hengde Zhu, Wei Wang, Zeyu Ren, Qinghua Zhou, Siyuan Lu, Shiting Sun, Ziquan Zhu, Juan Manuel Gorriz, et al. Deep learning in food category recognition.Information Fusion, 98:101859, 2023. 2

work page 2023

[69] [69]

Artificial intelligence applications to measure food and nu- trient intakes: scoping review.Journal of medical Internet research, 26:e54557, 2024

Jiakun Zheng, Junjie Wang, Jing Shen, and Ruopeng An. Artificial intelligence applications to measure food and nu- trient intakes: scoping review.Journal of medical Internet research, 26:e54557, 2024. 1

work page 2024

[70] [70]

Towards automatic learning of procedures from web instructional videos

Luowei Zhou, Chenliang Xu, and Jason Corso. Towards automatic learning of procedures from web instructional videos. InProceedings of the AAAI Conference on Artificial Intelligence, 2018. 4

work page 2018

[71] [71]

Defnet: Dual-branch enhanced feature fusion network for rgb-t crowd counting.IEEE Transactions on Intelligent Transportation Systems, 23(12):24540–24549, 2022

Wujie Zhou, Yi Pan, Jingsheng Lei, Lv Ye, and Lu Yu. Defnet: Dual-branch enhanced feature fusion network for rgb-t crowd counting.IEEE Transactions on Intelligent Transportation Systems, 23(12):24540–24549, 2022. 7

work page 2022