pith. machine review for the scientific record. sign in

arxiv: 2604.21527 · v1 · submitted 2026-04-23 · 💻 cs.LG

Recognition: unknown

A temporal deep learning framework for calibration of low-cost air quality sensors

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:03 UTC · model grok-4.3

classification 💻 cs.LG
keywords low-cost air quality sensorsLSTM calibrationtemporal dependenciesPM2.5PM10NO2sensor drift
0
0 comments X

The pith

An LSTM framework calibrates low-cost air quality sensors for PM2.5, PM10 and NO2 by learning temporal dependencies and meeting regulatory uncertainty limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning approach to correct data from affordable sensors that track fine particles and nitrogen dioxide. It uses an LSTM network to process sequences of readings and thereby account for how past conditions and environmental factors influence current measurements. The model is trained on paired data from cheap sensors and official instruments in Oxford, with features that include time lags, periodic encodings, and interaction terms. This yields higher accuracy than random forest models that treat each reading in isolation, and the corrected values satisfy official equivalence checks for all three pollutants. If the patterns hold, cities could deploy many more low-cost sensors to build denser air quality maps.

Core claim

The central claim is that an LSTM network trained on co-located low-cost and reference measurements from the OxAria network captures temporal dependencies through sequence-based learning and a feature set of time-lagged parameters, harmonic encodings, and interaction terms. This produces higher R2 values than a random forest baseline across training, validation, and test sets for PM2.5, PM10, and NO2. The resulting calibrated outputs meet regulatory standards with expanded uncertainties of 22.11% for NO2, 12.42% for PM10, and 9.1% for PM2.5 when checked with the Equivalence Spreadsheet Tool.

What carries the argument

The long short-term memory network that processes sequences of sensor observations together with time-lagged and interaction features to model delayed environmental effects.

If this is right

  • The LSTM model records higher R2 values than random forest on training, validation, and test sets for all three pollutants.
  • Calibrated low-cost sensor outputs satisfy regulatory equivalence criteria with the stated expanded uncertainties for NO2, PM10, and PM2.5.
  • The engineered feature set of time lags, harmonics, and interactions improves performance on temporal windows not seen during training.
  • The framework supports deployment of dense low-cost sensor networks for urban air quality monitoring.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the learned temporal structures are specific to Oxford conditions, the model would need adaptation or retraining when moved to other climates or hardware.
  • Longer intervals between reference co-locations might become feasible if the sequence model maintains accuracy over time.
  • The same sequence-learning approach could be tested on other pollutants or sensor types to see whether delayed-effect modeling generalizes.
  • A direct check would run the fixed model on data from a second city and measure whether uncertainties remain within the reported bounds.

Load-bearing premise

The temporal patterns, lags, and interaction terms learned from the single Oxford sensor network will generalize to unseen locations, different sensor batches, and future time windows without retraining or loss of accuracy.

What would settle it

Applying the Oxford-trained model without modification to new co-located low-cost and reference data from another city or a later time period and checking whether the expanded uncertainties stay at or below 22.11% for NO2, 12.42% for PM10, and 9.1% for PM2.5.

Figures

Figures reproduced from arXiv: 2604.21527 by Arindam Sengupta, Ben Marner, Jose Miguel P\'erez, Soledad Le Clainche, Tony Bush.

Figure 1
Figure 1. Figure 1: Overview of the calibration methodology: (a) raw sensor data input, (b) feature [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Time series of meteorological variables: (a) Relative humidity and (b) temper [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of low-cost sensor measurements with AURN reference data at [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Calibrated vs. reference concentrations across validation and test datasets for [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Time series of PM2.5 for unseen data: (a) reference and calibrated series at 15-minute resolution and (b) hourly-averaged reference and calibrated series. good performance under unseen conditions, with moderate increases in error compared to the testing period. Type R2 MAE (µg/m3 ) RMSE (µg/m3 ) Unseen Test (15-min) 0.71 2.08 2.77 Unseen Test (1H avg) 0.74 1.90 2.53 [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Time series of PM10 for unseen data: (a) reference and calibrated series at 15-minute resolution and (b) hourly-averaged reference and calibrated series. by about 14.0% and 16.1%, respectively. This represents the largest relative improvement among the three pollutants and highlights the benefit of using longer periods for gases with higher short-term variability. Dataset R2 MAE (µg/m3 ) RMSE (µg/m3 ) Unse… view at source ↗
Figure 7
Figure 7. Figure 7: Time series of NO2 for unseen data: (a) reference and calibrated series at 15- minute resolution and (b) hourly-averaged reference and calibrated series. calibration pipeline achieved consistently improved R2 values across training, validation, and test sets compared to the RF method reported by Bush et al. [19], while maintaining low error metrics. Further, the results have also been assessed using the Eq… view at source ↗
read the original abstract

Low-cost air quality sensors (LCS) provide a practical alternative to expensive regulatory-grade instruments, making dense urban monitoring networks possible. Yet their adoption is limited by calibration challenges, including sensor drift, environmental cross-sensitivity, and variability in performance from device to device. This work presents a deep learning framework for calibrating LCS measurements of PM$_{2.5}$, PM$_{10}$, and NO$_2$ using a Long Short-Term Memory (LSTM) network, trained on co-located reference data from the OxAria network in Oxford, UK. Unlike the Random Forest (RF) baseline, which treats each observation independently, the proposed approach captures temporal dependencies and delayed environmental effects through sequence-based learning, achieving higher $R^2$ values across training, validation, and test sets for all three pollutants. A feature set is constructed combining time-lagged parameters, harmonic encodings, and interaction terms to improve generalization on unseen temporal windows. Validation of unseen calibrated values against the Equivalence Spreadsheet Tool 3.1 demonstrates regulatory compliance with expanded uncertainties of 22.11% for NO$_2$, 12.42% for PM$_{10}$, and 9.1% for PM$_{2.5}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an LSTM-based temporal deep learning framework for calibrating low-cost sensors measuring PM2.5, PM10, and NO2. Using co-located reference data from the single-site OxAria network in Oxford, UK, it constructs features with time lags, harmonic encodings, and interaction terms to capture temporal dependencies and delayed effects. The LSTM is reported to outperform a Random Forest baseline with higher R² across train/validation/test splits and to achieve regulatory compliance via expanded uncertainties of 22.11% (NO2), 12.42% (PM10), and 9.1% (PM2.5) when validated with the Equivalence Spreadsheet Tool 3.1.

Significance. If the temporal modeling improvements hold under broader testing, the framework could support more reliable dense urban air-quality networks by addressing drift and cross-sensitivities through sequence learning rather than independent observations. The explicit use of the regulatory equivalence tool strengthens practical relevance. However, the single-site evaluation restricts claims of general applicability, so the work's impact would increase substantially with multi-location validation.

major comments (2)
  1. [Abstract] Abstract: The central claim that the LSTM framework 'improves generalization on unseen temporal windows' and provides a general calibration solution is load-bearing, yet all reported R² values and compliance metrics derive exclusively from temporal partitions of the OxAria dataset collected at one location; no cross-site, cross-device, or future-window external validation is described, leaving transferability untested.
  2. [Methods/Results] Methods and Results: The feature set includes free parameters for lag lengths and interaction terms in addition to the LSTM weights; without an ablation isolating the contribution of the sequence-based LSTM versus the engineered features, or details on dataset size, number of sensors, and cross-validation procedure, the source of the reported R² gains over RF cannot be verified.
minor comments (2)
  1. The abstract omits key experimental details such as the total number of observations, time span of the OxAria data, and exact values of the chosen lag lengths.
  2. No statistical significance tests (e.g., paired t-test or bootstrap) are mentioned for the R² differences between LSTM and RF, which would strengthen the comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestions. We address the major comments point by point below. We have revised the manuscript to clarify the scope of our claims and to provide additional methodological details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the LSTM framework 'improves generalization on unseen temporal windows' and provides a general calibration solution is load-bearing, yet all reported R² values and compliance metrics derive exclusively from temporal partitions of the OxAria dataset collected at one location; no cross-site, cross-device, or future-window external validation is described, leaving transferability untested.

    Authors: We acknowledge that the evaluation is based on a single site (OxAria network). However, the temporal partitions specifically test performance on unseen future temporal windows at that location, which addresses drift and temporal dependencies. We have revised the abstract to remove the phrasing 'provides a general calibration solution' and instead state that it improves generalization on unseen temporal windows within the study site. We agree that cross-site validation would strengthen the work and note this as a limitation for future research. The regulatory compliance is demonstrated for this deployment. revision: partial

  2. Referee: [Methods/Results] Methods and Results: The feature set includes free parameters for lag lengths and interaction terms in addition to the LSTM weights; without an ablation isolating the contribution of the sequence-based LSTM versus the engineered features, or details on dataset size, number of sensors, and cross-validation procedure, the source of the reported R² gains over RF cannot be verified.

    Authors: We will expand the Methods section to include the dataset size (number of hourly observations per pollutant), the number of sensors (co-located units from the OxAria network), and details on the temporal cross-validation procedure used for train/validation/test splits. Regarding the ablation, the Random Forest baseline employs the same feature set but processes each observation independently without sequence modeling. To isolate the LSTM's contribution, we will add an ablation study comparing the full LSTM model against a non-recurrent model using only the engineered features, and include these results in the revised manuscript to verify the source of the performance gains. revision: yes

Circularity Check

1 steps flagged

Reported R² values and regulatory compliance reduce to fitted performance on temporal splits of the single OxAria dataset

specific steps
  1. fitted input called prediction [Abstract]
    "achieving higher R^2 values across training, validation, and test sets for all three pollutants. A feature set is constructed combining time-lagged parameters, harmonic encodings, and interaction terms to improve generalization on unseen temporal windows. Validation of unseen calibrated values against the Equivalence Spreadsheet Tool 3.1 demonstrates regulatory compliance with expanded uncertainties of 22.11% for NO2, 12.42% for PM10, and 9.1% for PM2.5."

    The LSTM parameters are fitted to the OxAria co-located dataset; the quoted R² figures and expanded uncertainties are then computed on temporal hold-outs from that identical dataset. The 'achieved' performance and compliance are therefore statistically forced outputs of the fit rather than independent predictions or derivations external to the training data.

full rationale

The paper trains an LSTM on co-located OxAria reference data and reports higher R² on train/val/test splits plus Equivalence Tool compliance on the resulting calibrated values. These metrics are direct outputs of fitting the model (with its lagged features, harmonics, and interactions) to the same dataset's temporal partitions; no independent cross-site, cross-device, or future external benchmark is shown. This constitutes one instance of 'fitted input called prediction' with partial circularity in the generalization claim, while the core architecture itself is not self-referential.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The claim rests on supervised training of a high-capacity neural network on one specific co-located dataset plus the assumption that temporal structure learned there transfers elsewhere.

free parameters (2)
  • LSTM weights and biases
    High-dimensional parameters fitted during training on reference data
  • Lag lengths and interaction terms
    Chosen by authors to encode temporal and cross effects
axioms (2)
  • domain assumption Sensor errors exhibit learnable temporal dependencies and delayed environmental responses
    Invoked to justify sequence modeling over independent observations
  • domain assumption The OxAria training distribution is representative for generalization to new temporal windows and locations
    Required for the claim that the model works on unseen data

pith-pipeline@v0.9.0 · 5527 in / 1345 out tokens · 30309 ms · 2026-05-09T23:03:51.874129+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 4 canonical work pages

  1. [1]

    Ambient (outdoor) air pollution

    World Health Organization, “Ambient (outdoor) air pollution.” Fact Sheet,

  2. [2]

    Accessed: 2025

    Available at:https://www.who.int/news-room/fact-sheets/detail/ ambient-(outdoor)-air-quality-and-health. Accessed: 2025. 22

  3. [3]

    Global urban temporal trends in fine particulate matter (pm2·5) and attributable health burdens: estimates from global datasets,

    V. A. Southerland, M. Brauer, A. Mohegh, M. S. Hammer, A. Van Donkelaar, R. V. Martin, J. S. Apte, and S. C. Anenberg, “Global urban temporal trends in fine particulate matter (pm2·5) and attributable health burdens: estimates from global datasets,”The Lancet Planetary Health, vol. 6, no. 2, pp. e139–e146, 2022

  4. [4]

    Review of the performance of low-cost sensors for air quality monitoring,

    F. Karagulian, M. Barbiere, A. Kotsev, L. Spinelle, M. Gerboles, F. Lagler, N. Redon, S. Crunaire, and A. Borowiak, “Review of the performance of low-cost sensors for air quality monitoring,”Atmosphere, vol. 10, no. 9, p. 506, 2019

  5. [5]

    Perspectives on the calibration and validation of low-cost air quality sensors,

    T.-B. Ottosen, “Perspectives on the calibration and validation of low-cost air quality sensors,”Environmental Science & Technology, vol. 55, no. 19, pp. 12773–12775, 2021

  6. [6]

    Calibrating low-cost sensors for ambient air monitoring: Techniques, trends, and challenges,

    L. Liang, “Calibrating low-cost sensors for ambient air monitoring: Techniques, trends, and challenges,”Environmental Research, vol. 197, p. 111163, 2021

  7. [7]

    In-field calibration of low-cost sensors through xgboost and aggregate sensor data,

    K. Yin, J. Gersey, and P. Zhang, “In-field calibration of low-cost sensors through xgboost and aggregate sensor data,”arXiv preprint arXiv:2506.15840, 2025

  8. [8]

    Challenges and opportunities in calibrating low-cost environmental sensors,

    N. V. S. R. Nalakurthi, I. Abimbola, T. Ahmed, I. Anton, K. Riaz, Q. Ibrahim, A. Banerjee, A. Tiwari, and S. Gharbia, “Challenges and opportunities in calibrating low-cost environmental sensors,”Sensors, vol. 24, no. 11, p. 3650, 2024

  9. [9]

    On the reliability of machine learning applications in manufacturing environments,

    N. Jourdan, S. Sen, E. J. Husom, E. Garcia-Ceja, T. Biegel, and J. Metternich, “On the reliability of machine learning applications in manufacturing environments,” arXiv preprint arXiv:2112.06986, 2021

  10. [10]

    Blind calibration of air quality wireless sensor networks using deep neural networks,

    T. Veiga, E. Ljunggren, K. Bach, and S. Akselsen, “Blind calibration of air quality wireless sensor networks using deep neural networks,” in2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), pp. 1–6, IEEE, 2021

  11. [11]

    Evaluation and calibration of a low-cost particle sensor in ambient conditions using machine-learning methods,

    M. Si, Y. Xiong, S. Du, and K. Du, “Evaluation and calibration of a low-cost particle sensor in ambient conditions using machine-learning methods,”Atmospheric Meas- urement Techniques, vol. 13, no. 4, pp. 1693–1707, 2020

  12. [12]

    Assessment and calibration of a low- cost pm2. 5 sensor using machine learning (hybridlstm neural network): Feasibility study to build an air quality monitoring system,

    D. Park, G.-W. Yoo, S.-H. Park, and J.-H. Lee, “Assessment and calibration of a low- cost pm2. 5 sensor using machine learning (hybridlstm neural network): Feasibility study to build an air quality monitoring system,”Atmosphere, vol. 12, no. 10, p. 1306, 2021

  13. [13]

    Machine learning calib- ration of low-cost no 2 and pm 10 sensors: Non-linear algorithms and their impact on site transferability,

    P. Nowack, L. Konstantinovskiy, H. Gardiner, and J. Cant, “Machine learning calib- ration of low-cost no 2 and pm 10 sensors: Non-linear algorithms and their impact on site transferability,”Atmospheric Measurement Techniques, vol. 14, no. 8, pp. 5637– 5655, 2021. 23

  14. [14]

    A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring,

    N. Zimmerman, A. A. Presto, S. P. Kumar, J. Gu, A. Hauryliuk, E. S. Robinson, A. L. Robinson,et al., “A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring,”Atmospheric Measurement Techniques, vol. 11, no. 1, pp. 291–313, 2018

  15. [15]

    Field calibration of a low-cost air quality monitoring device in an urban background site using machine learning models,

    I. D. Apostolopoulos, G. Fouskas, and S. N. Pandis, “Field calibration of a low-cost air quality monitoring device in an urban background site using machine learning models,”Atmosphere, vol. 14, no. 2, p. 368, 2023

  16. [16]

    Field calibration of a cluster of low-cost commercially available sensors for air quality monitoring. part b: No, co and co2,

    L. Spinelle, M. Gerboles, M. G. Villani, M. Aleixandre, and F. Bonavitacola, “Field calibration of a cluster of low-cost commercially available sensors for air quality monitoring. part b: No, co and co2,”Sensors and Actuators B: Chemical, vol. 238, pp. 706–715, 2017

  17. [17]

    Machine learning for urban air quality analytics: A survey,

    J. Han, W. Zhang, H. Liu, and H. Xiong, “Machine learning for urban air quality analytics: A survey,”arXiv preprint arXiv:2310.09620, 2023

  18. [18]

    Dynamic calibration of low-cost pm2. 5 sensors using trust-based consensus mechanisms,

    S. Mahajan and D. Helbing, “Dynamic calibration of low-cost pm2. 5 sensors using trust-based consensus mechanisms,”npj Climate and Atmospheric Science, vol. 8, no. 1, p. 257, 2025

  19. [19]

    Performance of no, no 2 low cost sensors and three calibration approaches within a real world application,

    A. Bigi, M. Mueller, S. K. Grange, G. Ghermandi, and C. Hueglin, “Performance of no, no 2 low cost sensors and three calibration approaches within a real world application,”Atmospheric Measurement Techniques, vol. 11, no. 6, pp. 3717–3735, 2018

  20. [20]

    Machine learning techniques to improve the field performance of low-cost air quality sensors,

    T. Bush, N. Papaioannou, F. Leach, F. D. Pope, A. Singh, G. N. Thomas, B. Stacey, and S. Bartington, “Machine learning techniques to improve the field performance of low-cost air quality sensors,”Atmospheric Measurement Techniques Discussions, vol. 2021, pp. 1–34, 2021.http://doi.org/10.5194/amt-2021-282

  21. [21]

    Deep spatio-temporal residual networks for city- wide crowd flows prediction,

    J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual networks for city- wide crowd flows prediction,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, 2017

  22. [22]

    Deep air quality forecasting using hybrid deep learning framework,

    S. Du, T. Li, Y. Yang, and S.-J. Horng, “Deep air quality forecasting using hybrid deep learning framework,”IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 6, pp. 2412–2424, 2019

  23. [23]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, and A. Courville,Deep Learning. MIT Press, 2016.http: //www.deeplearningbook.org. Accessed: 2025

  24. [24]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. 24

  25. [25]

    Calibrations of low-cost air pollution monitoring sensors for co, no2, o3, and so2,

    P. Han, H. Mei, D. Liu, N. Zeng, X. Tang, Y. Wang, and Y. Pan, “Calibrations of low-cost air pollution monitoring sensors for co, no2, o3, and so2,”Sensors, vol. 21, no. 1, p. 256, 2021

  26. [26]

    Band-sensitive calibration of low-cost pm2. 5 sensors by lstm model with dynamically weighted loss function,

    J. Ryu and H. Park, “Band-sensitive calibration of low-cost pm2. 5 sensors by lstm model with dynamically weighted loss function,”Sustainability, vol. 14, no. 10, p. 6120, 2022

  27. [27]

    Few-shot calibration of low-cost air pollution (pm _{2.5}) sensors using meta learning,

    K. Yadav, V. Arora, M. Kumar, S. N. Tripathi, V. M. Motghare, and K. A. Ra- jput, “Few-shot calibration of low-cost air pollution (pm _{2.5}) sensors using meta learning,”IEEE Sensors Letters, vol. 6, no. 5, pp. 1–4, 2022

  28. [28]

    Developing a relative humidity correction for low-cost sensors measuring ambient particulate matter,

    A. Di Antonio, O. A. Popoola, B. Ouyang, J. Saffell, and R. L. Jones, “Developing a relative humidity correction for low-cost sensors measuring ambient particulate matter,”Sensors, vol. 18, no. 9, p. 2790, 2018

  29. [29]

    Towards a hygroscopic growth calibration for low-cost pm 2.5 sensors,

    M. Y. Patel, P. F. Vannucci, J. Kim, W. M. Berelson, and R. C. Cohen, “Towards a hygroscopic growth calibration for low-cost pm 2.5 sensors,”Atmospheric Measure- ment Techniques, vol. 17, no. 3, pp. 1051–1060, 2024

  30. [30]

    Performance as- sessment of low-and medium-cost pm2. 5 sensors in real-world conditions in central europe,

    B. Atfeh, Z. Barcza, V. Groma, Á. V. Tordai, and R. Mészáros, “Performance as- sessment of low-and medium-cost pm2. 5 sensors in real-world conditions in central europe,”Atmosphere, vol. 16, no. 7, p. 796, 2025

  31. [31]

    Iot based air pollution monitoring & prediction system,

    M. Rakib, S. Haq, M. I. Hossain, and T. Rahman, “Iot based air pollution monitoring & prediction system,” in2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), pp. 184–189, IEEE, 2022

  32. [32]

    Deep learning architecture for air quality predictions,

    X. Li, L. Peng, Y. Hu, J. Shao, and T. Chi, “Deep learning architecture for air quality predictions,”Environmental Science and Pollution Research, vol. 23, no. 22, pp. 22408–22417, 2016

  33. [33]

    Sensor based ambient air concen- tration data for nitrogen dioxide and particles in oxford, measured by the oxaria project 2020 to 2021,

    A. Bush, N. Papaioannou, F. Leach, F. D. Pope, A. Singh, G. N. Thomas, B. Stacey, and S. Bartington, “Sensor based ambient air concen- tration data for nitrogen dioxide and particles in oxford, measured by the oxaria project 2020 to 2021,” 2022.http://ora.ox.ac.uk/objects/uuid: 66fbe8c1-4b63-4124-bf0d-a78cbc9e1408. Accessed: 2025

  34. [34]

    Can commercial low-cost sensor platforms contribute to air qual- itymonitoringandexposureestimates?,

    N. Castell, F. R. Dauge, P. Schneider, M. Vogt, U. Lerner, B. Fishbain, D. Broday, and A. Bartonova, “Can commercial low-cost sensor platforms contribute to air qual- itymonitoringandexposureestimates?,”Environment international, vol.99, pp.293– 302, 2017

  35. [35]

    Recursive and rolling windows for medical time series forecasting: a comparative study,

    L. B. Amor, I. Lahyani, and M. Jmaiel, “Recursive and rolling windows for medical time series forecasting: a comparative study,” in2016 IEEE Intl Conference on Com- putational Science and Engineering (CSE) and IEEE Intl Conference on Embedded 25 and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Com- puting and Applications for Business...

  36. [36]

    Rolling window time series prediction using mapreduce,

    L. Li, F. Noorian, D. J. Moss, and P. H. Leong, “Rolling window time series prediction using mapreduce,” inProceedings of the 2014 IEEE 15th international conference on information reuse and integration (IEEE IRI 2014), pp. 757–764, IEEE, 2014

  37. [37]

    Random search for hyper-parameter optimization,

    J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,”The journal of machine learning research, vol. 13, no. 1, pp. 281–305, 2012

  38. [38]

    Dro- pout: a simple way to prevent neural networks from overfitting,

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dro- pout: a simple way to prevent neural networks from overfitting,”The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014

  39. [39]

    Assessment of the performance of a low-cost air quality monitor in an indoor environment through different calibration models,

    D. Suriano and M. Penza, “Assessment of the performance of a low-cost air quality monitor in an indoor environment through different calibration models,”Atmosphere, vol. 13, p. 567, 03 2022

  40. [40]

    Array program- ming with numpy,

    C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith,et al., “Array program- ming with numpy,”nature, vol. 585, no. 7825, pp. 357–362, 2020

  41. [41]

    Guide to the demonstration of equivalence of ambient air monitoring methods

    European Commission, “Guide to the demonstration of equivalence of ambient air monitoring methods.” European Commission Working Group on Guidance for the Demonstration of Equivalence, 2010.http://environment.ec.europa.eu/topics/ air/air-quality/assessment_en. Accessed: 2025

  42. [42]

    Directive 2008/50/ec of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe,

    European Commission, “Directive 2008/50/ec of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe,” 2008.http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 32008L0050. Accessed: 2025

  43. [43]

    Scikit-learn: Machine learning in python,

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,et al., “Scikit-learn: Machine learning in python,”the Journal of machine Learning research, vol. 12, pp. 2825– 2830, 2011

  44. [44]

    Modelflows-app: data-driven post- processing and reduced order modelling tools,

    A. Hetherington, A. Corrochano, R. Abadía-Heredia, E. Lazpita, E. Muñoz, P. Díaz, E. Maiora, M. López-Martín, and S. Le Clainche, “Modelflows-app: data-driven post- processing and reduced order modelling tools,”Computer Physics Communications, vol. 301, p. 109217, 2024

  45. [45]

    Hierarchical higher- order dynamic mode decomposition for clustering and feature selection,

    A. Corrochano, G. D’Alessio, A. Parente, and S. Le Clainche, “Hierarchical higher- order dynamic mode decomposition for clustering and feature selection,”Computers & Mathematics with Applications, vol. 158, pp. 36–45, 2024. 26

  46. [46]

    Data splitting technique to fit any machine learning model,

    S. Kumar, “Data splitting technique to fit any machine learning model,”Received from: www. towardsdatascience. com, 2020

  47. [47]

    Ideal dataset splitting ratios in machine learning algorithms: General concerns for data scientists and data analysts,

    I. Muraina, “Ideal dataset splitting ratios in machine learning algorithms: General concerns for data scientists and data analysts,” inProceedings of the 7th Interna- tional Mardin Artuklu Scientific Research Conference, pp. 496–504, Mardin Artuklu University, 2022. 27