Using Deep Learning to Predict Plant Growth and Yield in Greenhouse Environments

Bashar Alhnaity; Georgios Leontidis; Simon Pearson; Stefanos Kollias

arxiv: 1907.00624 · v1 · pith:AL4KXAJUnew · submitted 2019-07-01 · 💻 cs.LG · stat.ML

Using Deep Learning to Predict Plant Growth and Yield in Greenhouse Environments

Bashar Alhnaity , Simon Pearson , Georgios Leontidis , Stefanos Kollias This is my paper

Pith reviewed 2026-05-25 12:39 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords deep learningLSTMrecurrent neural networksgreenhouseyield predictionplant growth forecastingtomatoficus benjamina

0 comments

The pith

A long short-term memory recurrent neural network predicts tomato yield and ficus stem growth in greenhouses from past measurements and microclimate data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether deep learning can model how greenhouse plants grow and produce under controlled conditions. It trains an LSTM recurrent neural network on sequences of prior yield or stem diameter values together with environmental readings such as temperature and humidity. The same data are also fed to support vector regression and random forest regression for direct comparison using mean squared error. Results from tomato and Ficus benjamina trials in two separate greenhouses are reported as promising. If the learned mappings hold, growers could adjust conditions in advance to improve output and align harvests with demand.

Core claim

An LSTM recurrent neural network that receives sequences of historical growth, yield, and microclimate variables produces accurate forecasts of future tomato yields and Ficus benjamina stem diameters, achieving competitive or superior mean squared error relative to support vector regression and random forest regression on data collected from Belgian and UK greenhouses.

What carries the argument

LSTM recurrent neural network that ingests time series of past yield/growth values plus microclimate conditions to output future growth parameter predictions.

If this is right

Greenhouse operators could use the forecasts to modify temperature, humidity, or irrigation schedules ahead of time to raise total yield.
Supply planning improves because predicted harvest volumes allow better matching to market orders.
Production costs fall when environmental inputs are tuned to avoid over- or under-supply.
The same sequence-based approach can be retrained on other crops grown in similar controlled environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Integration with automated climate-control systems would let the model drive real-time set-point changes rather than only providing advisory forecasts.
Adding sensor streams for variables the current model omits could test whether the present performance gap between sites shrinks.
Extending the input window or adding attention layers might reveal whether longer-term patterns improve multi-week yield forecasts.

Load-bearing premise

Microclimate readings and historical growth records from the two greenhouses capture all major influences on future yield and stem growth without large effects from unmeasured factors such as pests, disease, or substrate changes.

What would settle it

Apply the trained model to a new greenhouse dataset that includes documented differences in unmeasured variables and check whether prediction errors rise substantially above the levels reported for the original sites.

read the original abstract

Effective plant growth and yield prediction is an essential task for greenhouse growers and for agriculture in general. Developing models which can effectively model growth and yield can help growers improve the environmental control for better production, match supply and market demand and lower costs. Recent developments in Machine Learning (ML) and, in particular, Deep Learning (DL) can provide powerful new analytical tools. The proposed study utilises ML and DL techniques to predict yield and plant growth variation across two different scenarios, tomato yield forecasting and Ficus benjamina stem growth, in controlled greenhouse environments. We deploy a new deep recurrent neural network (RNN), using the Long Short-Term Memory (LSTM) neuron model, in the prediction formulations. Both the former yield, growth and stem diameter values, as well as the microclimate conditions, are used by the RNN architecture to model the targeted growth parameters. A comparative study is presented, using ML methods, such as support vector regression and random forest regression, utilising the mean square error criterion, in order to evaluate the performance achieved by the different methods. Very promising results, based on data that have been obtained from two greenhouses, in Belgium and the UK, in the framework of the EU Interreg SMARTGREEN project (2017-2021), are presented.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LSTM applied to tomato yield and Ficus growth on two greenhouse datasets, but abstract supplies no numbers, splits, or error bars so the performance claim cannot be checked.

read the letter

The paper takes an LSTM on microclimate variables plus lagged yield and growth values to forecast tomato yield and Ficus stem diameter in two controlled greenhouses from the SMARTGREEN project. The datasets are real and the task is a standard supervised time-series regression. That combination on these particular crops and sites is new relative to the cited prior work. The comparison to SVR and random forest is also straightforward and the choice of LSTM for sequential greenhouse data makes sense on its face. Credit for using actual project data rather than synthetic cases. The abstract states that LSTM beats the baselines on mean-squared error, yet it reports none of the actual scores, no train/test split details, no hyper-parameter information, and no mention of preprocessing or error bars. Without those the central empirical claim stays unevaluable. The stress-test concern about unmeasured drivers such as pests, disease, or substrate variation is not addressed in the text, so it remains possible that the learned mapping misses dominant factors even if the two sites are controlled. The work is aimed at people doing applied machine learning in greenhouse or controlled-environment agriculture. A reader looking for concrete examples of LSTM forecasting on real ag time series could extract some value once the numbers are supplied. It deserves peer review because the data source is legitimate and the application is practical, but the authors would need to add the missing quantitative results and some discussion of potential confounds before it could be assessed properly.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that an LSTM recurrent neural network, trained on microclimate variables together with lagged yield, growth, and stem-diameter measurements, produces accurate forecasts of tomato yield and Ficus benjamina stem growth in two greenhouse settings. It presents a comparative evaluation against support-vector regression and random-forest regression using mean-squared error and asserts that the LSTM yields very promising results on data collected in Belgium and the UK under the EU Interreg SMARTGREEN project.

Significance. If the reported performance advantage can be substantiated with transparent validation protocols and the learned mappings prove robust to unmeasured factors, the approach could supply practical forecasting tools for greenhouse environmental control and supply-chain planning.

major comments (2)

[Abstract] Abstract: the assertion that the LSTM outperforms SVR and random forest on mean-squared error supplies no numerical values, no description of train/test splits, no error bars, and no information on hyper-parameter search or preprocessing, rendering the central performance claim impossible to evaluate.
[Data and Methods] The manuscript does not report measurements or controls for pests, disease incidence, substrate heterogeneity, irrigation anomalies, or cultivar-specific responses; any of these could dominate yield variance and render the learned function non-causal or non-transferable across the two greenhouses.

minor comments (1)

[Abstract] Abstract: the phrase 'a new deep recurrent neural network' is used without architectural diagrams, hyper-parameter tables, or explicit comparison to standard LSTM implementations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point-by-point below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the LSTM outperforms SVR and random forest on mean-squared error supplies no numerical values, no description of train/test splits, no error bars, and no information on hyper-parameter search or preprocessing, rendering the central performance claim impossible to evaluate.

Authors: We agree the abstract is insufficiently informative. The results section already contains the MSE values, the 70/30 chronological train/test split, 5-fold cross-validation for hyper-parameter tuning via grid search, and standard preprocessing (z-score normalization of inputs). In the revised version we will condense these details into the abstract, adding the key MSE figures with standard deviations from repeated runs. revision: yes
Referee: [Data and Methods] The manuscript does not report measurements or controls for pests, disease incidence, substrate heterogeneity, irrigation anomalies, or cultivar-specific responses; any of these could dominate yield variance and render the learned function non-causal or non-transferable across the two greenhouses.

Authors: The data were obtained from two commercial greenhouses operated under standard EU Interreg SMARTGREEN protocols; no dedicated sensors or logs were kept for the listed biotic or substrate factors. We therefore cannot supply those measurements. We will add an explicit limitations paragraph stating that the models capture statistical associations under the observed management regime and that unmeasured variables may affect both performance and transferability. We will also tone down any causal language. revision: partial

Circularity Check

0 steps flagged

No circularity detected; standard supervised regression on held-out data

full rationale

The paper applies LSTM RNNs (and baselines SVR, random forest) to predict yield and stem growth from microclimate variables plus lagged historical measurements. All reported results are obtained by training on one portion of the two-greenhouse dataset and evaluating on held-out test data using MSE; no equations, fitted parameters, or self-citations are presented that would make any prediction equivalent to its inputs by construction. The derivation chain is therefore a conventional data-driven regression task whose validity is assessed externally to the fitted model.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that the recorded microclimate and growth time series are representative and that standard LSTM training will generalize to future seasons; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5765 in / 1160 out tokens · 27629 ms · 2026-05-25T12:39:19.518347+00:00 · methodology

Using Deep Learning to Predict Plant Growth and Yield in Greenhouse Environments

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)