OmniFood8K: Single-Image Nutrition Estimation via Hierarchical Frequency-Aligned Fusion
Pith reviewed 2026-05-10 14:51 UTC · model grok-4.3
The pith
Predicting depth from a single RGB image and fusing it with RGB features in the frequency domain enables more accurate food nutrition estimation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By predicting a depth map from a single RGB image and refining it with a Scale-Shift Residual Adapter for scale and structure, then hierarchically aligning and fusing the RGB and depth features in the frequency domain through the Frequency-Aligned Fusion Module, and finally applying a Mask-based Prediction Head to focus on key regions, the method achieves improved nutritional predictions that surpass existing approaches on multiple datasets including the new OmniFood8K.
What carries the argument
The Frequency-Aligned Fusion Module (FAFM) that performs hierarchical alignment and fusion of RGB and depth features in the frequency domain to capture better compositional details for nutrition.
If this is right
- Nutrition estimation becomes feasible using only standard camera photos in daily settings.
- Coverage expands to Chinese and other non-Western cuisines through the dedicated dataset.
- Synthetic data with preserved labels helps train models on varied food compositions.
- Frequency domain processing of multimodal features improves accuracy for ingredient-based predictions.
Where Pith is reading between the lines
- Such a system could power mobile apps that scan meals for instant dietary feedback.
- The fusion technique might transfer to estimating other properties like freshness or allergens from photos.
- Further gains could come from integrating this with real depth data when available or improving the initial depth prediction.
- Large-scale synthetic data generation may reduce reliance on expensive manual annotations for similar vision tasks.
Load-bearing premise
The synthetic dataset must preserve precise nutritional labels while adding realistic variations, and the frequency fusion with predicted depth must deliver accuracy gains over standard RGB processing.
What would settle it
A test where the full model is compared to a version without the frequency fusion module on the OmniFood8K validation set, and no reduction in prediction error for nutrients like calories or protein is observed.
Figures
read the original abstract
Accurate estimation of food nutrition plays a vital role in promoting healthy dietary habits and personalized diet management. Most existing food datasets primarily focus on Western cuisines and lack sufficient coverage of Chinese dishes, which restricts accurate nutritional estimation for Chinese meals. Moreover, many state-of-the-art nutrition prediction methods rely on depth sensors, restricting their applicability in daily scenarios. To address these limitations, we introduce OmniFood8K, a comprehensive multimodal dataset comprising 8,036 food samples, each with detailed nutritional annotations and multi-view images. In addition, to enhance models' capability in nutritional prediction, we construct NutritionSynth-115K, a large-scale synthetic dataset that introduces compositional variations while preserving precise nutritional labels. Moreover, we propose an end-to-end framework for nutritional prediction from a single RGB image. First, we predict a depth map from a single RGB image and design the Scale-Shift Residual Adapter (SSRA) to refine it for global scale consistency and local structural preservation. Second, we propose the Frequency-Aligned Fusion Module (FAFM) to hierarchically align and fuse RGB and depth features in the frequency domain. Finally, we design a Mask-based Prediction Head (MPH) to emphasize key ingredient regions via dynamic channel selection for more accurate prediction. Extensive experiments on multiple datasets demonstrate the superiority of our method over existing approaches. Project homepage: https://yudongjian.github.io/OmniFood8K-food/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces OmniFood8K, a multimodal dataset of 8,036 food samples with nutritional annotations and multi-view images focused on Chinese cuisines, along with the synthetic NutritionSynth-115K dataset for compositional augmentation. It proposes an end-to-end single-RGB nutrition estimation framework that first predicts and refines a depth map via the Scale-Shift Residual Adapter (SSRA), hierarchically aligns and fuses RGB-depth features in the frequency domain using the Frequency-Aligned Fusion Module (FAFM), and applies a Mask-based Prediction Head (MPH) for ingredient-region emphasis. The central claim is that this pipeline outperforms prior methods on multiple datasets.
Significance. If the quantitative results and ablations hold, the work is significant for enabling practical single-image nutrition estimation without depth sensors and for filling a gap in non-Western food datasets. The combination of synthetic data generation, frequency-domain fusion, and mask-based prediction offers a coherent technical approach that could influence mobile health and dietary applications.
major comments (2)
- [§4.1, Table 1] §4.1 and Table 1: The superiority claim over RGB-only baselines rests on the reported gains from SSRA+FAFM+MPH, but the ablation study does not isolate the contribution of frequency alignment versus simple concatenation; without this breakdown the load-bearing role of FAFM remains unclear.
- [§3.2] §3.2: The assertion that NutritionSynth-115K preserves precise nutritional labels while adding realistic variations is central to training validity, yet the data-generation procedure (ingredient sampling, rendering parameters) is described at a high level without pseudocode or validation metrics against real distributions.
minor comments (3)
- [Figure 3] Figure 3: The frequency-domain visualization would benefit from explicit axis labels and a side-by-side comparison with spatial-domain fusion to clarify the alignment benefit.
- [§2] §2: Several citations to prior food datasets (e.g., Food-101, Nutrition5K) are present but lack discussion of their Western bias statistics, which would strengthen the motivation for OmniFood8K.
- [Eq. (7)] Notation: The definition of the hierarchical frequency alignment loss in Eq. (7) uses an undefined weighting hyperparameter λ; clarify its value and sensitivity.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments. We address each major comment point by point below and will revise the manuscript to incorporate the requested clarifications, which we believe will strengthen the presentation of our contributions.
read point-by-point responses
-
Referee: [§4.1, Table 1] §4.1 and Table 1: The superiority claim over RGB-only baselines rests on the reported gains from SSRA+FAFM+MPH, but the ablation study does not isolate the contribution of frequency alignment versus simple concatenation; without this breakdown the load-bearing role of FAFM remains unclear.
Authors: We agree that the existing ablation table shows the combined effect of the full pipeline but does not isolate the benefit of frequency-domain alignment in FAFM against a direct concatenation baseline. To address this, we will add a new ablation row in the revised Table 1 (and corresponding discussion in §4.1) that replaces FAFM with hierarchical concatenation of RGB and depth features while keeping SSRA and MPH fixed. This will provide a direct comparison and clarify the specific contribution of the frequency alignment mechanism. revision: yes
-
Referee: [§3.2] §3.2: The assertion that NutritionSynth-115K preserves precise nutritional labels while adding realistic variations is central to training validity, yet the data-generation procedure (ingredient sampling, rendering parameters) is described at a high level without pseudocode or validation metrics against real distributions.
Authors: We acknowledge that Section 3.2 currently provides only a high-level overview of the synthetic data pipeline. In the revised manuscript we will expand this section with (i) pseudocode for the ingredient sampling and rendering procedure and (ii) quantitative validation metrics, including distributional comparisons (e.g., KL divergence on nutritional vectors and visual feature statistics) between NutritionSynth-115K and the real OmniFood8K samples. These additions will substantiate the claim that precise labels are preserved while realistic variations are introduced. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces new datasets (OmniFood8K and NutritionSynth-115K) and an end-to-end architecture (SSRA + FAFM + MPH) for single-image nutrition estimation. No equations, derivations, or parameter-fitting steps appear in the provided text that reduce a claimed prediction or result to an input defined by the same data or self-citation. The central claims rest on experimental superiority across multiple datasets, which constitutes external validation rather than an internal self-referential loop. The method description is technically coherent and does not invoke uniqueness theorems, ansatzes smuggled via prior self-work, or renaming of known results as new derivations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Explainable Artificial Intelligence Techniques for Interpretation of Food Models: a Review
Leonardo Arrighi, Ingrid Alves de Moraes, Marco Zul- lich, Michele Simonato, Douglas Fernandes Barbin, and Sylvio Barbon Junior. Explainable artificial intelligence techniques for interpretation of food datasets: a review.arXiv preprint arXiv:2504.10527, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Menu-match: Restaurant-specific food logging from images
Oscar Beijbom, Neel Joshi, Dan Morris, Scott Saponas, and Siddharth Khullar. Menu-match: Restaurant-specific food logging from images. In2015 IEEE Winter Conference on Applications of Computer Vision, pages 844–851, 2015. 4
work page 2015
-
[3]
Hongbo Bi, Ranwan Wu, Ziqi Liu, Huihui Zhu, Cong Zhang, and Tian-Zhu Xiang. Cross-modal hierarchical interaction network for rgb-d salient object detection.Pattern Recogni- tion, 136:109194, 2023. 7
work page 2023
-
[4]
Rachele Bianco et al. 2d prediction of the nutritional compo- sition of dishes from food images: Deep learning algorithm selection and data curation beyond the nutrition5k project. Nutrients, 17(13):2196, 2025. 2
work page 2025
-
[5]
Food-101–mining discriminative components with random forests
Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. Food-101–mining discriminative components with random forests. InEuropean Conference on Computer Vision, pages 446–461. Springer, 2014. 2, 4
work page 2014
-
[6]
Deep-based ingredi- ent recognition for cooking recipe retrieval
Jingjing Chen and Chong-Wah Ngo. Deep-based ingredi- ent recognition for cooking recipe retrieval. InProceedings of the 24th ACM International Conference on Multimedia, pages 32–41, 2016. 2, 4
work page 2016
-
[7]
Metafood3d: Large 3d food object dataset with nutrition values.arXiv e-prints, pages arXiv–2409,
Yuhao Chen et al. Metafood3d: Large 3d food object dataset with nutrition values.arXiv e-prints, pages arXiv–2409,
-
[8]
Food recognition and calorie estimation using machine learning
Siddhartha Chinthala, Prem Kumar Erla, Akshaya Dongari, Ajay Bantu, Sai Ganesh Chityala, and M Saravanan. Food recognition and calorie estimation using machine learning. International Journal of Engineering & Extended Technolo- gies Research, 8(2):480–488, 2026. 2
work page 2026
-
[9]
Phawinpon Chotwanvirat, Aree Prachansuwan, Pimnapanut Sridonpai, and Wantanee Kriengsinyos. Advancements in using ai for dietary assessment based on food images: scop- ing review.Journal of Medical Internet Research, 26: e51432, 2024. 2
work page 2024
-
[10]
arXiv preprint arXiv:2602.24240 (2026)
Chengyan Deng, Zhangquan Chen, Li Yu, Kai Zhang, Xue Zhou, and Wang Zhang. Joint geometric and trajectory con- sistency learning for one-step real-world super-resolution. arXiv preprint arXiv:2602.24240, 2026. 1
-
[11]
Chengyan Deng, Kai Zhang, Lieqiang Yang, Wang Zhang, and Yu Li. Ihmambasr: An importance-guided hierarchi- cal mamba with dynamic prompt for single image super- resolution.Pattern Recognition, page 113057, 2026. 1
work page 2026
-
[12]
Rezvan Esmaeily, Mohammad Amin Razavi, and Seyed Hadi Razavi. A step forward in food science, technology and industry using artificial intelligence.Trends in Food Science & Technology, 143:104286, 2024. 2
work page 2024
-
[13]
Single-view food portion estimation based on geometric models
Shaobo Fang, Chang Liu, Fengqing Zhu, Edward J Delp, and Carol J Boushey. Single-view food portion estimation based on geometric models. In2015 IEEE International Sympo- sium on Multimedia (ISM), pages 385–390, 2015. 4
work page 2015
-
[14]
Zhihui Feng et al. Ingredient-guided rgb-d fusion network for nutritional assessment.IEEE Transactions on AgriFood Electronics, 2024. 1, 3, 8
work page 2024
-
[15]
Navigating weight prediction with diet diary
Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, and Yu-Gang Jiang. Navigating weight prediction with diet diary. InProceedings of the 32nd ACM International Conference on Multimedia, pages 127–136, 2024. 2
work page 2024
-
[16]
Dpf-nutrition: Food nutrition estimation via depth prediction and fusion.Foods, 12(23), 2023
Yuzhe Han, Qimin Cheng, Wenjin Wu, and Ziyang Huang. Dpf-nutrition: Food nutrition estimation via depth prediction and fusion.Foods, 12(23), 2023. 1, 2, 7, 8
work page 2023
-
[17]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016. 7
work page 2016
-
[18]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mo- bilenetv3. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314–1324, 2019. 7
work page 2019
-
[19]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil- ian Q. Weinberger. Densely connected convolutional net- works. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 7
work page 2017
-
[20]
Tammie Jakstas, Andrew Miller, Vanessa A Shrewsbury, Tamara Bucher, and Clare E Collins. Psychometric testing of the teacher food and nutrition-related health and wellbe- ing questionnaire.BMC Public Health, 2026. 1
work page 2026
-
[21]
Pengkun Jiao, Xinlan Wu, Bin Zhu, Jingjing Chen, Chong- Wah Ngo, and Yugang Jiang. Rode: Linear rectified mixture of diverse experts for food large multi-modal models.arXiv preprint arXiv:2407.12730, 2024. 2, 3, 4, 7
-
[22]
Fotios S Konstantakopoulos, Eleni I Georga, and Dimitrios I Fotiadis. A review of image-based food recognition and vol- ume estimation artificial intelligence systems.IEEE Reviews in Biomedical Engineering, 17:136–152, 2023. 1
work page 2023
-
[23]
Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network
Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. InProceedings of the 29th ACM international conference on multimedia, pages 4481–4490, 2021. 7
work page 2021
-
[24]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feicht- enhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 11976–11986,
-
[25]
Swin transformer: Hierarchical vision trans- former using shifted windows
Ze Liu et al. Swin transformer: Hierarchical vision trans- former using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021. 7
work page 2021
-
[26]
Boyuan Ma, Donglin Zhang, and Xiao-Jun Wu. Food nutri- tion estimation with rgb-d fusion module and bidirectional feature pyramid network.Multimedia Systems, 31(2):1–11,
-
[27]
Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. You are what you eat: Ex- ploring rich recipe information for cross-region food anal- ysis.IEEE Transactions on Multimedia, 20(4):950–964,
-
[28]
A survey on food computing.ACM Computing Surveys, 52(5):1–36, 2019
Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. A survey on food computing.ACM Computing Surveys, 52(5):1–36, 2019. 1
work page 2019
-
[29]
Ingredient-guided cascaded multi-attention network for food recognition
Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. Ingredient-guided cascaded multi-attention network for food recognition. InProceedings of the 27th ACM In- ternational Conference on Multimedia, pages 1331–1339,
-
[30]
Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network
Weiqing Min et al. Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network. InProceedings of the 28th ACM International Conference on Multimedia, pages 393–401, 2020. 2, 4
work page 2020
-
[31]
Weiqing Min et al. Large scale visual food recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9932–9949, 2023. 2, 4
work page 2023
-
[32]
Food and nutrition in the maha strategy—promise and peril.JAMA, 335(2):119–121, 2026
Dariush Mozaffarian, Emily A Callahan, and William H Frist. Food and nutrition in the maha strategy—promise and peril.JAMA, 335(2):119–121, 2026. 1
work page 2026
-
[33]
Fudong Nian, Yujie Hu, Yanhong Gu, Zhize Wu, Shimeng Yang, and Jianhua Shu. Ingredient-guided multi-modal in- teraction and refinement network for rgb-d food nutrition as- sessment.Digital Signal Processing, 153:104664, 2024. 7
work page 2024
-
[34]
Deepak NR et al. A framework for food recognition and pre- dicting its nutritional value through convolution neural net- work. InProceedings of the International Conference on Innovative Computing & Communication, page 6, 2022. 3
work page 2022
-
[35]
Cathal O’Hara and Eileen R Gibney. Dietary intake assess- ment using a novel, generic meal–based recall and a 24-hour recall: Comparison study.Journal of Medical Internet Re- search, 26:e48817, 2024. 1
work page 2024
-
[36]
Fmifood: Multi-modal contrastive learning for food image classifica- tion
Xinyue Pan, Jiangpeng He, and Fengqing Zhu. Fmifood: Multi-modal contrastive learning for food image classifica- tion. In2024 IEEE 26th International Workshop on Multi- media Signal Processing (MMSP), pages 1–6, 2024. 2
work page 2024
-
[37]
Advancing food nutrition estimation via visual-ingredient feature fusion
Huiyan Qi, Bin Zhu, Chong-Wah Ngo, Jingjing Chen, and Ee-Peng Lim. Advancing food nutrition estimation via visual-ingredient feature fusion. InProceedings of the 2025 International Conference on Multimedia Retrieval, pages 1091–1099, 2025. 1, 2, 4
work page 2025
-
[38]
Wenbin Quan, Jingbo Zhou, Juan Wang, Jihong Huang, and Liping Du. Machine learning-driven precision nutrition: A paradigm evolution in dietary assessment and intervention. Nutrients, 18(1):45, 2025. 1
work page 2025
-
[39]
Eric Robinson and Ciar ´an G Forde. Concerns around ev- idence that food processing should be included in dietary guidance.Nature Medicine, pages 1–3, 2026. 2
work page 2026
-
[40]
Sergio Romero-Tapiador et al. Are vision-language mod- els ready for dietary assessment? exploring the next frontier in ai-powered food image recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 430–439, 2025. 2
work page 2025
-
[41]
Learning cross-modal embeddings for cooking recipes and food images
Amaia Salvador et al. Learning cross-modal embeddings for cooking recipes and food images. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 3020–3028, 2017. 2, 4
work page 2017
-
[42]
Rapid non-destructive analysis of food nutrient content using swin-nutrition.Foods, 11(21), 2022
Wenjing Shao, Sujuan Hou, Weikuan Jia, and Yuanjie Zheng. Rapid non-destructive analysis of food nutrient content using swin-nutrition.Foods, 11(21), 2022. 1, 2, 3, 7
work page 2022
-
[43]
Vision- based food nutrition estimation via rgb-d fusion network
Wenjing Shao, Weiqing Min, Sujuan Hou, Mengjiang Luo, Tianhao Li, Yuanjie Zheng, and Shuqiang Jiang. Vision- based food nutrition estimation via rgb-d fusion network. Food Chemistry, 424:136309, 2023. 7, 8
work page 2023
-
[44]
An end-to-end food portion estimation framework based on shape reconstruction from monocular image
Zeman Shao, Gautham Vinod, Jiangpeng He, and Fengqing Zhu. An end-to-end food portion estimation framework based on shape reconstruction from monocular image. In 2023 IEEE ICME, pages 942–947, 2023. 1, 3, 7
work page 2023
-
[45]
Zhidong Shen, Adnan Shehzad, Si Chen, Hui Sun, and Jin Liu. Machine learning based approach on food recognition and nutrition estimation.Procedia Computer Science, 174: 448–453, 2020. 1
work page 2020
-
[46]
Peihua Shi, Yuan Wang, Jianmin Xu, Yanling Zhao, Baolin Yang, Zhengqi Yuan, and Qingyun Sun. Rice nitrogen nutri- tion estimation with rgb images and machine learning meth- ods.Computers and Electronics in Agriculture, 180:105860,
-
[47]
Eleanor Shonkoff, Kelly Copeland Cara, Xuechen Pei, Mei Chung, Shreyas Kamath, Karen Panetta, and Erin Hennessy. Ai-based digital image dietary assessment methods com- pared to humans and ground truth: a systematic review.An- nals of Medicine, 55(2):2273497, 2023. 1
work page 2023
-
[48]
Rohan Singh, Mathieu Th ´eo Eric Verest, and Marcel Salath´e. Minimum days estimation for reliable dietary intake infor- mation: findings from a digital cohort.European Journal of Clinical Nutrition, pages 1–11, 2025. 1
work page 2025
-
[49]
Mark H. Stone. The cubit: A history and measurement com- mentary.Journal of Anthropology, 2014(1):489757, 2014. 3
work page 2014
-
[50]
Rethinking the inception archi- tecture for computer vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception archi- tecture for computer vision. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016. 7
work page 2016
-
[51]
Efficientnet: Rethinking model scaling for convolutional neural networks
Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational Conference on Machine Learning, pages 6105–6114. PMLR,
-
[52]
Hikaru Tanabe and Keiji Yanai. Reasoning-driven food en- ergy estimation via multimodal large language models.Nu- trients, 17(7):1128, 2025. 2
work page 2025
-
[53]
Nutrition5k: Towards automatic nu- tritional understanding of generic food
Quin Thames et al. Nutrition5k: Towards automatic nu- tritional understanding of generic food. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8903–8911, 2021. 2, 4, 7
work page 2021
-
[54]
Theodoros Varzakas and Slim Smaoui. Global food security and sustainability issues: the road to 2030 from nutrition and sustainable healthy diets to food systems change.Foods, 13 (2):306, 2024. 2
work page 2030
-
[55]
Image based food energy estimation with depth domain adaptation
Gautham Vinod, Zeman Shao, and Fengqing Zhu. Image based food energy estimation with depth domain adaptation. In2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval, pages 262–267, 2022. 3, 7
work page 2022
-
[56]
Coarse-to-fine nutrition prediction
Binglu Wang, Tianci Bu, Zaiyi Hu, Le Yang, Yongqiang Zhao, and Xuelong Li. Coarse-to-fine nutrition prediction. IEEE Transactions on Multimedia, 26:3651–3662, 2023. 2, 7
work page 2023
-
[57]
Smart fibers and textiles for personal health manage- ment.ACS nano, 15(8):12497–12508, 2021
Huimin Wang, Yong Zhang, Xiaoping Liang, and Yingying Zhang. Smart fibers and textiles for personal health manage- ment.ACS nano, 15(8):12497–12508, 2021. 1
work page 2021
-
[58]
Wei Wang, Weiqing Min, Tianhao Li, Xiaoxiao Dong, Haisheng Li, and Shuqiang Jiang. A review on vision-based analysis for automatic dietary assessment.Trends in Food Science & Technology, 122:223–237, 2022. 1
work page 2022
-
[59]
Clare Whitton et al. Accuracy of energy and nu- trient intake estimation versus observed intake using 4 technology-assisted dietary assessment methods: a random- ized crossover feeding study.The American journal of clini- cal nutrition, 120(1):196–210, 2024. 1
work page 2024
-
[60]
Convnext v2: Co-designing and scal- ing convnets with masked autoencoders
Sanghyun Woo et al. Convnext v2: Co-designing and scal- ing convnets with masked autoencoders. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023. 7
work page 2023
-
[61]
A large-scale benchmark for food im- age segmentation
Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven CH Hoi, and Qianru Sun. A large-scale benchmark for food im- age segmentation. InProceedings of the 29th ACM Inter- national Conference on Multimedia, pages 506–515, 2021. 2
work page 2021
-
[62]
Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 2, 5
work page 2024
-
[63]
Spatial-aware multi-modal information fu- sion for food nutrition estimation
Dongjian Yu, Weiqing Min, Xin Jin, Qian Jiang, and Shuqiang Jiang. Spatial-aware multi-modal information fu- sion for food nutrition estimation. InProceedings of the 33rd ACM International Conference on Multimedia, page 8863–8871, 2025. 6
work page 2025
-
[64]
Cross-modality discrepant interaction net- work for rgb-d salient object detection
Chen Zhang et al. Cross-modality discrepant interaction net- work for rgb-d salient object detection. InProceedings of the 29th ACM International Conference on Multimedia, pages 2094–2102, 2021. 7
work page 2094
-
[65]
Jiaming Zhang, Huayao Liu, Kailun Yang, Xinxin Hu, Ruip- ing Liu, and Rainer Stiefelhagen. Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers.IEEE Transactions on Intelligent Transportation Systems, 24(12): 14679–14694, 2023. 7
work page 2023
-
[66]
Delivering arbitrary-modal semantic segmentation
Jiaming Zhang et al. Delivering arbitrary-modal semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1136– 1147, 2023. 7
work page 2023
-
[67]
Peng Zhang, Jiali Su, Hui Zhen, Tong Yu, Liangchen Wei, Mingyue Zheng, Chaoyuan Zeng, and Wei Shu. Recent de- sign strategies and applications of small molecule fluorescent probes for food detection.Coordination Chemistry Reviews, 522:216232, 2025. 2
work page 2025
-
[68]
Deep learning in food category recognition.Information Fusion, 98:101859, 2023
Yudong Zhang, Lijia Deng, Hengde Zhu, Wei Wang, Zeyu Ren, Qinghua Zhou, Siyuan Lu, Shiting Sun, Ziquan Zhu, Juan Manuel Gorriz, et al. Deep learning in food category recognition.Information Fusion, 98:101859, 2023. 2
work page 2023
-
[69]
Jiakun Zheng, Junjie Wang, Jing Shen, and Ruopeng An. Artificial intelligence applications to measure food and nu- trient intakes: scoping review.Journal of medical Internet research, 26:e54557, 2024. 1
work page 2024
-
[70]
Towards automatic learning of procedures from web instructional videos
Luowei Zhou, Chenliang Xu, and Jason Corso. Towards automatic learning of procedures from web instructional videos. InProceedings of the AAAI Conference on Artificial Intelligence, 2018. 4
work page 2018
-
[71]
Wujie Zhou, Yi Pan, Jingsheng Lei, Lv Ye, and Lu Yu. Defnet: Dual-branch enhanced feature fusion network for rgb-t crowd counting.IEEE Transactions on Intelligent Transportation Systems, 23(12):24540–24549, 2022. 7
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.