pith. sign in

arxiv: 2509.19367 · v1 · submitted 2025-09-19 · 📡 eess.SP · cs.LG· stat.ML

Low-Cost Sensor Fusion Framework for Organic Substance Classification and Quality Control Using Classification Methods

Pith reviewed 2026-05-18 16:37 UTC · model grok-4.3

classification 📡 eess.SP cs.LGstat.ML
keywords sensor fusionorganic classificationArduinomachine learningquality controllow-cost sensorsrandom forestneural network
0
0 comments X

The pith

Low-cost Arduino sensor fusion with machine learning classifies organic substances at 93 to 94 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a sensor fusion system using an Arduino Mega 2560 and three off-the-shelf environmental and gas sensors to gather data from organic materials. Data from ten categories, such as fresh and expired apple juice, onion, garlic, ginger, cinnamon, and cardamom, is collected in the lab and processed with correlation analysis and dimensionality reduction. Several machine learning models, including tuned random forest, voting ensembles, and neural networks, are then applied to this dataset. The strongest models reach 93 to 94 percent accuracy on test data, indicating that inexpensive hardware can support reliable classification and quality monitoring of these substances.

Core claim

A standard Arduino Mega 2560 microcontroller equipped with three commercial environmental and gas sensors collects labeled data for ten distinct classes of organic substances, including fresh and expired samples. After correlation-based feature selection and PCA or LDA reduction, supervised classifiers such as support vector machines, decision trees, random forests with tuning, artificial neural networks, and ensemble voting classifiers are trained. The best of these achieve test accuracies in the 93 to 94 percent range, establishing that this low-cost multisensory platform enables practical identification and quality control of organic compounds.

What carries the argument

The Arduino Mega 2560-based multisensor platform that combines raw sensor outputs with correlation-driven preprocessing and multiple tuned machine learning classifiers.

If this is right

  • The framework supports non-destructive quality checks for organic products using portable equipment.
  • Hyperparameter tuning and ensemble methods improve performance over single models on this sensor data.
  • The collected dataset demonstrates feasibility for similar classification tasks with low-cost hardware.
  • Correlation analysis aids in selecting relevant features from environmental and gas sensor readings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the sensor set or classes could broaden applications to more food items or environmental monitoring.
  • Real-time deployment on microcontrollers might allow on-site decisions without sending samples to labs.
  • The results point to potential cost reductions in supply chain quality assurance for perishables.
  • Combining this with wireless connectivity could create distributed sensing networks for organic quality.

Load-bearing premise

The in-house sensor data from the ten classes sufficiently represents real-world variations so that the models generalize to unseen samples outside the lab.

What would settle it

Collecting new sensor readings from the same substance classes but under varied conditions such as different temperatures or with different brands, then checking if the model accuracy remains above 85 percent on this fresh data.

Figures

Figures reproduced from arXiv: 2509.19367 by Borhan Uddin Chowdhury, Damian Valles, Md Raf E Ul Shougat.

Figure 1
Figure 1. Figure 1: Sensor fusion and data collection flow using Arduino Mega 2560. B. Data Preprocessing The raw sensor data for each organic substance was combined into a single dataset, with the target column labeled according to the substance name (e.g., "onion," "garlic," etc.). Only the target column was categorical; all other features were numeric, and there were no missing values, as the data was collected after allow… view at source ↗
Figure 2
Figure 2. Figure 2: illustrates the complete model development pipeline, including traditional ML workflows, feature set variations, hyperparameter tuning, ensemble construction, and ANN architecture exploration. B. Model Evaluation and Feature Selection Performance analysis was conducted through stratified 5- fold cross-validation across all four dataset versions. While V1 (all features) yielded slightly higher accuracy, app… view at source ↗
Figure 3
Figure 3. Figure 3: Correlation of individual sensor features with the target class label. B. Visualizations Multiple projection techniques were applied to the complete and reduced feature sets to assess class separability and the effects of dimensionality reduction [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cross-validation accuracy comparison of SVM, Random Forest (RF), and Decision Tree (DT) across four dataset versions (V1–V4). caused by ambient drift. Given the minimal performance trade-off, reduced complexity, and improved generalizability, V2 was selected as the final version for model tuning and further evaluation. D. Hyperparameter Tuning To improve the baseline model’s performance on the selected V2 … view at source ↗
Figure 4
Figure 4. Figure 4: LDA and PCA projection on the complete and reduced feature sets [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: t-SNE and UMAP projection on the complete and reduced feature sets. C. Cross-Validation Accuracy Comparison Cross-validation accuracies were compared across four dataset versions (V1–V4) for each classifier (SVM, RF, DT) to evaluate the impact of feature engineering and dimensionality reduction. The results are presented in [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 15
Figure 15. Figure 15: Train, validation, and test accuracy of five ANN variants with temperature and pressure dropped. When using all available features, the accuracy across all ANN architectures was tightly clustered around 94.00%, reflecting consistent and strong performance. Among the tested configurations, the wider architecture achieved the highest test accuracy of 94.01%, marginally outperforming the other variants. Furt… view at source ↗
Figure 17
Figure 17. Figure 17: Confusion matrix of the wider ANN model with dropped features on the test set. Table III demonstrates that the macro-averaged precision, recall, and F1-score were all around 0.94. TABLE III. CLASSIFICATION REPORT OF THE WIDER ANN MODEL WITH DROPPED FEATURES ON TEST SET class precision recall f1-score support apple_juice 0.9117 0.9035 0.9076 2000 cardamom 0.9900 0.9945 0.9923 2000 cinnamon 0.9945 0.9900 0.… view at source ↗
read the original abstract

We present a sensor-fusion framework for rapid, non-destructive classification and quality control of organic substances, built on a standard Arduino Mega 2560 microcontroller platform equipped with three commercial environmental and gas sensors. All data used in this study were generated in-house: sensor outputs for ten distinct classes - including fresh and expired samples of apple juice, onion, garlic, and ginger, as well as cinnamon and cardamom - were systematically collected and labeled using this hardware setup, resulting in a unique, application-specific dataset. Correlation analysis was employed as part of the preprocessing pipeline for feature selection. After preprocessing and dimensionality reduction (PCA/LDA), multiple supervised learning models - including Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF), each with hyperparameter tuning, as well as an Artificial Neural Network (ANN) and an ensemble voting classifier - were trained and cross-validated on the collected dataset. The best-performing models, including tuned Random Forest, ensemble, and ANN, achieved test accuracies in the 93 to 94 percent range. These results demonstrate that low-cost, multisensory platforms based on the Arduino Mega 2560, combined with advanced machine learning and correlation-driven feature engineering, enable reliable identification and quality control of organic compounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a low-cost sensor-fusion framework built on an Arduino Mega 2560 with three commercial environmental/gas sensors for classifying ten organic substance classes (fresh/expired apple juice, onion, garlic, ginger, cinnamon, cardamom). In-house data are collected, preprocessed via correlation-based feature selection and PCA/LDA, and used to train and cross-validate multiple supervised models (SVM, DT, tuned RF, ANN, ensemble voting classifier); the best models reach 93–94% test accuracy.

Significance. If the reported accuracies prove robust, the work could supply an accessible, microcontroller-based tool for rapid non-destructive quality control of organic materials. The systematic in-house data collection and comparison of several tuned models with dimensionality reduction constitute a practical empirical contribution, though the absence of external validation limits broader impact.

major comments (2)
  1. [Abstract] Abstract: the claim that the framework enables 'reliable identification and quality control' rests on 93–94% test accuracies, yet the abstract (and, by extension, the results) supplies no sample counts per class, sensor model numbers, or explicit checks for batch effects, sensor drift, or ambient-condition variation; without these, the generalization from controlled lab samples to real-world use remains unverified.
  2. [Results] Results section (cross-validation and test-set reporting): the 93–94% accuracies for tuned RF, ensemble, and ANN are obtained entirely on data from a single Arduino setup under laboratory conditions; no independent test set collected under altered humidity, temperature, or sample-preparation protocols is reported, so the central claim that the sensor responses encode persistent class signatures is only moderately supported.
minor comments (2)
  1. [Methods] Methods: specify the exact correlation threshold used for feature selection and the number of components retained in PCA/LDA.
  2. [Results] Table reporting model performance: include standard deviations across cross-validation folds and the total number of samples to allow assessment of statistical reliability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity and acknowledge study limitations.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the framework enables 'reliable identification and quality control' rests on 93–94% test accuracies, yet the abstract (and, by extension, the results) supplies no sample counts per class, sensor model numbers, or explicit checks for batch effects, sensor drift, or ambient-condition variation; without these, the generalization from controlled lab samples to real-world use remains unverified.

    Authors: We agree that the abstract would benefit from additional specifics. We will revise the abstract to report the sample counts per class and the exact commercial sensor models employed. Our data collection occurred under controlled laboratory conditions without dedicated experiments for batch effects, sensor drift, or ambient variations; we will add a limitations paragraph in the discussion section to explicitly note these factors and their implications for generalization. revision: yes

  2. Referee: [Results] Results section (cross-validation and test-set reporting): the 93–94% accuracies for tuned RF, ensemble, and ANN are obtained entirely on data from a single Arduino setup under laboratory conditions; no independent test set collected under altered humidity, temperature, or sample-preparation protocols is reported, so the central claim that the sensor responses encode persistent class signatures is only moderately supported.

    Authors: The referee accurately observes that all reported results derive from a single hardware setup and laboratory environment, with performance evaluated via cross-validation and an internal held-out test split. We will revise the results and discussion sections to clarify that the demonstrated class signatures and accuracies apply specifically to these controlled conditions. We will also expand the text to highlight the value of future external validation under varying environmental and preparation protocols while maintaining that the current empirical comparison of models on the in-house dataset remains a valid contribution. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical ML pipeline on in-house sensor data

full rationale

The paper describes an experimental workflow: in-house collection of sensor readings for ten specific classes using an Arduino setup, correlation-based feature selection, PCA/LDA dimensionality reduction, training of tuned supervised models (RF, ANN, ensemble, etc.), and cross-validation to obtain 93-94% test accuracies. No equations, derivations, or first-principles results are present that reduce to fitted parameters or self-citations by construction. The reported performance metrics are direct empirical outcomes of training and evaluating on the collected dataset; the central claim does not rely on any self-referential loop or imported uniqueness theorem. This is a standard applied ML study whose validity hinges on external generalization (addressed by the skeptic) rather than internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested assumption that the three commercial sensors produce sufficiently discriminative signals for the chosen organic classes and that the laboratory collection protocol captures the relevant variability.

axioms (1)
  • domain assumption Sensor outputs from the three commercial units are stable and repeatable enough across measurement sessions to serve as reliable features.
    Invoked implicitly when the authors treat raw sensor readings as input features after correlation filtering.

pith-pipeline@v0.9.0 · 5767 in / 1248 out tokens · 35314 ms · 2026-05-18T16:37:46.897363+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Applications and advances in electronic- nose technologies,

    A. D. Wilson and M. Baietto, "Applications and advances in electronic- nose technologies," Sensors, vol. 9, no. 7, pp. 5099–5148, 2009

  2. [2]

    A comprehensive review of VOCs as a key indicator in food authentication,

    H. Yang et al., "A comprehensive review of VOCs as a key indicator in food authentication," eFood, vol. 6, no. 3, p. e70057, 2025

  3. [3]

    Detection of volatile organic compounds (VOCs) using SnO₂ gas -sensor array and artificial neural network,

    A. K. Srivastava, "Detection of volatile organic compounds (VOCs) using SnO₂ gas -sensor array and artificial neural network, " Sens. Actuators B Chem., vol. 96, no. 1–2, pp. 24–37, 2003

  4. [4]

    Tea quality prediction using a tin oxide -based electronic nose: an artificial intelligence approach,

    R. Dutta, E. L. Hines, J. W. Gardner, K. R. Kashwan, and M. Bhuyan, "Tea quality prediction using a tin oxide -based electronic nose: an artificial intelligence approach," Sens. Actuators B Chem., vol. 94, no. 2, pp. 228–237, 2003

  5. [5]

    Electronic nose and its application in the food industry: a review,

    M. Wang and Y. Chen, "Electronic nose and its application in the food industry: a review," Eur. Food Res. Technol., vol. 250, no. 1, pp. 21 – 67, 2024

  6. [6]

    Development of compact electronic noses: A review,

    L. Cheng, Q.-H. Meng, A. J. Lilienthal, and P.-F. Qi, "Development of compact electronic noses: A review," Meas. Sci. Technol., vol. 32, no. 6, p. 062002, 2021

  7. [7]

    Application of convolutional long short -term memory neural networks to signals collected from a sensor network for autonomous gas source localization in outdoor environments,

    C. Bilgera, A. Yamamoto, M. Sawano, H. Matsukura, and H. Ishida, "Application of convolutional long short -term memory neural networks to signals collected from a sensor network for autonomous gas source localization in outdoor environments," Sensors, vol. 18, no. 12, p. 4484, 2018

  8. [8]

    AI-driven sensor array electronic nose system for authenticating and recognizing aromas in spirit samples,

    J.-T. Sun and C. -H. Lee, "AI-driven sensor array electronic nose system for authenticating and recognizing aromas in spirit samples, " Sensors and Mater., vol. 37, no. 1, pp. 23–40, 2025

  9. [9]

    A fast and robust gas recognition algorithm based on hybrid convolutional and recurrent neural network,

    X. Pan, H. Zhang, W. Ye, A. Bermak, and X. Zhao, "A fast and robust gas recognition algorithm based on hybrid convolutional and recurrent neural network," IEEE Access, vol. 7, pp. 100954–100963, 2019

  10. [10]

    Gas recognition under sensor drift by using deep learning,

    Q. Liu, X. Hu, M. Ye, X. Cheng, and F. Li, "Gas recognition under sensor drift by using deep learning," Int. J. Intell. Syst., vol. 30, no. 8, pp. 907–922, 2015

  11. [11]

    Gas detection and identification using multimodal artificial intelligence-based sensor fusion,

    P. Narkhede, R. Walambe, S. Mandaokar, P. Chandel, K. Kotecha, and G. Ghinea, "Gas detection and identification using multimodal artificial intelligence-based sensor fusion, " Appl. Syst. Innov. , vol. 4, no. 1, p. 3, 2021

  12. [12]

    Classification of data from electronic nose using gradient tree boosting algorithm,

    Y. Luo, W. Ye, X. Zhao, X. Pan, and Y. Cao, "Classification of data from electronic nose using gradient tree boosting algorithm," Sensors, vol. 17, no. 10, p. 2376, 2017

  13. [13]

    Sensor fusion models for integrating electronic nose and surface acoustic wave sensor for apple quality evaluation,

    C. Li, "Sensor fusion models for integrating electronic nose and surface acoustic wave sensor for apple quality evaluation," unpublished, 2007

  14. [14]

    Recent progress in smart electronic nose technologies enabled with machine learning methods,

    Z. Ye, Y. Liu, and Q. Li, "Recent progress in smart electronic nose technologies enabled with machine learning methods, " Sensors, vol. 21, no. 22, p. 7620, 2021

  15. [15]

    Calvini and L

    R. Calvini and L. Pigani, "Toward the development of combined artificial sensing systems for food quality evaluation: A review on the application of data fusion of electronic noses, electronic tongues and electronic eyes," Sensors, vol. 22, no. 2, p. 577, 2022

  16. [16]

    A selective feature optimized multi -sensor based e -nose system detecting illegal drugs validated in diverse laboratory conditions,

    H. W. Noh, Y. Jang, H. D. Park, D. Kim, J. H. Choi, and C. -G. Ahn, "A selective feature optimized multi -sensor based e -nose system detecting illegal drugs validated in diverse laboratory conditions, " Sens. Actuators B Chem., vol. 390, p. 133965, 2023

  17. [17]

    Review on food quality assessment using machine learning and electronic nose system,

    H. Anwar, T. Anwar, and S. Murtaza, "Review on food quality assessment using machine learning and electronic nose system, " Biosens. Bioelectron. X, vol. 14, p. 100365, 2023

  18. [18]

    Intelligent electrochemical sensors for precise identification of volatile organic compounds enabled by neural network analysis,

    Y. Li, X. Huang, E. Witherspoon, Z. Wang, P. Dong, and Q. Li, "Intelligent electrochemical sensors for precise identification of volatile organic compounds enabled by neural network analysis," IEEE Sens. J., 2024