A Bayesian Updating Framework for Long-term Multi-Environment Trial Data in Plant Breeding

Hans-Peter Piepho; Maryna Prus; Stephan Bark; Volker Schmid; Waqas Ahmed Malik

arxiv: 2604.16203 · v1 · submitted 2026-04-17 · 📊 stat.ME · stat.AP· stat.ML

A Bayesian Updating Framework for Long-term Multi-Environment Trial Data in Plant Breeding

Stephan Bark , Waqas Ahmed Malik , Maryna Prus , Hans-Peter Piepho , Volker Schmid This is my paper

Pith reviewed 2026-05-10 07:45 UTC · model grok-4.3

classification 📊 stat.ME stat.APstat.ML

keywords Bayesian updatingmulti-environment trialsvariance componentslinear mixed modelshistorical dataMCMCplant breedingconjugate priors

0 comments

The pith

Bayesian updating with historical windows stabilizes variance component estimates in multi-environment plant trials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Bayesian linear mixed model for long-term multi-environment trial data in plant breeding that systematically incorporates historical information. It manages past data in successive windows to set priors, then uses MCMC sampling to produce full posterior distributions for variance components while ensuring they stay positive. This yields conjugate inverse gamma and inverse Wishart families for the priors and posteriors. A sympathetic reader would care because conventional REML methods frequently shrink key variances to zero, producing unrealistic estimates of how genotypes respond across environments. The resulting posteriors are then plugged into an A-optimality criterion to guide allocation of future trials across agro-ecological zones.

Core claim

The central claim is that a Bayesian reformulation of the linear mixed model, combined with successive historical data windows to inform priors, maintains variance components as positive values and delivers realistic distributional estimates through MCMC sampling, with conjugate prior and posterior distributions belonging to the inverse gamma and inverse Wishart families, thereby allowing historical MET information to be integrated objectively for improved variance component estimation and subsequent experimental design.

What carries the argument

Successive historical data windows that supply conjugate priors for variance components in a Bayesian linear mixed model, with MCMC sampling from the resulting posteriors.

If this is right

Variance components remain strictly positive and carry full distributional information rather than point estimates that can collapse to zero.
Historical data are incorporated through an objective windowing procedure that updates priors for the current analysis.
Posterior samples enable direct use in optimality criteria such as A-optimality for determining trial allocations to agro-ecological zones.
The conjugacy of the inverse gamma and inverse Wishart families simplifies the Bayesian updating calculations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The windowed updating scheme could be tested for robustness by varying window lengths and checking sensitivity of the resulting allocations.
Improved variance estimates may allow more reliable extrapolation of genotype performance to future or untested environments within the target population.
The framework might extend naturally to other accumulating longitudinal datasets where variance components need to be kept positive and informed by history.

Load-bearing premise

Historical MET data can be partitioned into successive windows that objectively inform priors without introducing bias or temporal mismatch between past and current environments.

What would settle it

If independent validation data show that variance components and trial allocations derived from the Bayesian-updated posteriors yield poorer predictive accuracy for genotype performance across environments than those from standard REML estimation, the practical advantage would be falsified.

Figures

Figures reproduced from arXiv: 2604.16203 by Hans-Peter Piepho, Maryna Prus, Stephan Bark, Volker Schmid, Waqas Ahmed Malik.

**Figure 2.** Figure 2: Histograms of the drawn MCMC posterior samples after the fifth multi-year [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Histograms of the drawn MCMC posterior samples of all successive sampled [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Histograms of the drawn MCMC posterior samples for variance-covariance [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 3.** Figure 3: Here, the REML estimate of the var gen year component is relatively small at 0.0246 and the REML estimate of the var zone year component is exactly zero. Over the windows, high posterior density regions shift more and more towards the REML point estimate, though they are not able to fully capture it. Increasing the amount of data and the number of data windows shifts the initial starting point a priori mo… view at source ↗

read the original abstract

In variety testing, multi-environment trials (MET) are essential for evaluating the genotypic performance of crop plants. A persistent challenge in the statistical analysis of MET data is the estimation of variance components, which are often still inaccurately estimated or shrunk to exactly zero when using residual (restricted) maximum likelihood (REML) approaches. At the same time, institutions conducting MET typically possess extensive historical data that can, in principle, be leveraged to improve variance component estimation. However, these data are rarely incorporated sufficiently. The purpose of this paper is to address this gap by proposing a Bayesian framework that systematically integrates historical information to stabilize variance component estimation and better quantify uncertainty. Our Bayesian linear mixed model (BLMM) reformulation uses priors and Markov chain Monte Carlo (MCMC) methods to maintain the variance components as positive, yielding more realistic distributional estimates. Furthermore, our model incorporates historical prior information by managing MET data in successive historical data windows. Variance component prior and posterior distributions are shown to be conjugate and belong to the inverse gamma and inverse Wishart families. While Bayesian methodology is increasingly being used for analyzing MET data, to the best of our knowledge, this study comprises one of the first serious attempts to objectively inform priors in the context of MET data. This refers to the proposed Bayesian updating approach. To demonstrate the framework, we consider an application where posterior variance component samples are plugged into an A-optimality experimental design criterion to determine the average optimal allocations of trials to agro-ecological zones in a sub-divided target population of environments (TPE).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's Bayesian windowed updating for MET variance priors is a practical step for plant breeding stats but rests on an untested stationarity assumption that could bias the results.

read the letter

Hi, the core idea is a Bayesian linear mixed model for multi-environment trial data that uses successive historical windows to set conjugate inverse-gamma and inverse-Wishart priors on the variance components. MCMC sampling produces positive variances with full posterior distributions instead of the zero-shrinkage that REML can give, and the posteriors are then plugged into an A-optimality criterion for allocating trials across agro-ecological zones in a subdivided target population of environments. This is new in trying to objectify the prior choice by sliding through past data rather than picking it subjectively, and the conjugacy keeps the updating tractable. The paper does well to connect the estimation problem directly to a downstream design task that breeders actually face. The main soft spot is the implicit stationarity assumption. If variance components or the target environment structure shift over time from climate trends, management changes, or breeding progress, then earlier windows will pull the priors in the wrong direction and the posteriors may not be calibrated for current conditions. The abstract and description give no sign of sensitivity checks on window length or simulations under drift, so that part of the claim is provisional. The application to design is a good touch but lacks side-by-side numbers against standard REML or other Bayesian setups on the same data, which makes the practical gain hard to judge. This is aimed at agricultural statisticians and plant breeders who work with long-term MET data and want better uncertainty on variances. A reader in that niche will get value from the setup and the concrete example. It deserves serious peer review because the proposal is workable and targets a real estimation issue, even if referees will need to push on the robustness checks and validation.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Bayesian updating framework for long-term multi-environment trial (MET) data in plant breeding. It reformulates the linear mixed model using Bayesian methods with MCMC to ensure positive variance components, manages historical data in successive windows to set conjugate priors from the inverse-gamma and inverse-Wishart families, and demonstrates the approach by using posterior samples in an A-optimality criterion for allocating trials across agro-ecological zones in a target population of environments (TPE).

Significance. Should the conjugacy properties and unbiased updating hold, the framework offers a principled way to incorporate historical MET data into variance component estimation, potentially yielding more stable and realistic estimates than REML while properly quantifying uncertainty. This could have practical impact in plant breeding programs with extensive historical datasets.

major comments (2)

[Methods / Model formulation] The central claim that variance component prior and posterior distributions are conjugate and belong to the inverse-gamma and inverse-Wishart families (Abstract) requires an explicit model specification and derivation in the methods section; without the linear mixed model equations and updating rules, it is impossible to verify whether the historical-window data produces the asserted conjugacy.
[Framework description] The Bayesian updating scheme relies on partitioning MET data into successive historical windows to inform priors without temporal bias (Abstract and framework description). This assumption is load-bearing for the central claim; the manuscript does not address or test for non-stationarity in variance components (G, E, GxE) across windows due to climate trends or breeding progress, which risks miscalibrated posteriors.

minor comments (2)

The application to A-optimality experimental design is mentioned but lacks any numerical results, simulation details, or real-data validation showing improved allocations.
Clarify the objective criteria used to define and select the successive historical data windows, including any sensitivity checks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important areas for clarification and strengthening of the manuscript. We address each major comment below and outline the revisions we will implement.

read point-by-point responses

Referee: [Methods / Model formulation] The central claim that variance component prior and posterior distributions are conjugate and belong to the inverse-gamma and inverse-Wishart families (Abstract) requires an explicit model specification and derivation in the methods section; without the linear mixed model equations and updating rules, it is impossible to verify whether the historical-window data produces the asserted conjugacy.

Authors: We agree that the conjugacy claim requires a fully explicit derivation to allow verification. The current manuscript states the result but does not present the complete linear mixed model equations together with the prior-to-posterior updating rules for the inverse-gamma (error and genotype variances) and inverse-Wishart (genotype-by-environment covariance) distributions. In the revised version we will add a dedicated subsection in Methods that (i) writes the full BLMM, (ii) specifies the conjugate priors, and (iii) derives the closed-form posterior parameters after incorporating each successive historical window. This will make the conjugacy transparent and directly address the referee’s concern. revision: yes
Referee: [Framework description] The Bayesian updating scheme relies on partitioning MET data into successive historical windows to inform priors without temporal bias (Abstract and framework description). This assumption is load-bearing for the central claim; the manuscript does not address or test for non-stationarity in variance components (G, E, GxE) across windows due to climate trends or breeding progress, which risks miscalibrated posteriors.

Authors: We acknowledge that the stationarity assumption across historical windows is central and that the manuscript does not currently examine potential non-stationarity arising from climate trends or genetic progress. In the revision we will (i) explicitly state the stationarity assumption and the rationale for window length selection, (ii) add a short discussion of possible sources of non-stationarity, and (iii) include a limited sensitivity analysis (either on real data subsets or via simulation) that perturbs variance components across windows and reports the resulting change in posterior means and credible intervals. These additions will quantify the robustness of the updating procedure under mild departures from stationarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity: historical windows supply external priors; conjugacy is algebraic under stated model

full rationale

The derivation uses successive historical MET data windows to construct conjugate inverse-gamma and inverse-Wishart priors for variance components in a Bayesian linear mixed model, then updates via MCMC. This is a standard Bayesian updating step in which earlier data serve as external input rather than being re-fitted or renamed as a prediction from the target data. No self-citation chain, self-definitional loop, or ansatz smuggling is present in the abstract or claimed framework; the conjugacy result follows directly from the chosen likelihood-prior pair and does not reduce the final posterior estimates to quantities already obtained by construction from the same observations. The approach remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard Bayesian conjugacy for linear mixed models and the assumption that historical data windows provide unbiased prior information; no new entities are introduced.

axioms (2)

domain assumption Variance components in linear mixed models for MET data follow inverse-gamma or inverse-Wishart distributions that are conjugate to the likelihood.
Invoked when stating that prior and posterior belong to these families.
ad hoc to paper Historical MET data can be managed in successive windows to update priors without temporal bias.
Central to the Bayesian updating claim but not justified in the abstract.

pith-pipeline@v0.9.0 · 5600 in / 1365 out tokens · 52027 ms · 2026-05-10T07:45:55.722584+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages

[1]

Alvarez, I., Niemi, J., & Simpson, M. (2014). Bayesian inference for a covariance matrix. Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475- 7772.1004 Buntaran, H., Vasquez, A. M. B., Gordillo, A., Sahr, M., Wimmer, V., & Piepho, H. P. (2022). Assessing the response to genomic selection by simulation.Theoretical and Applied Gen...

work page doi:10.4148/2475- 2014
[2]

A., da Silva, C

https://doi.org/10.2134/agronj2016.07.0395 de Oliveira, L. A., da Silva, C. P., Nuvunga, J. J., & Balestre, M. (2016). Bayesian GGE biplot models applied to maize multi-environment trials.Genetics and Molecular Research,15(2). https://doi.org/10.4238/gmr.15028612 Frey, J., Hartung, J., Ogutu, J. O., & Piepho, H. P. (2024). Analyze as randomized – why drop...

work page doi:10.2134/agronj2016.07.0395 2016
[3]

J., da Silva, C

https://doi.org/10.1007/s11032-015-0248-y Nuvunga, J. J., da Silva, C. P., de Oliveira, L. A., de Lima, R. R., & Balestre, M. (2019). Bayesian factor analytic model: An approach in multiple environment trials.PLoS ONE,14(8), e0220290. https://doi.org/10.1371/journal.pone.0220290 Patterson, H. D. (1997). Analysis of series of variety trials. In R. A. Kempt...

work page doi:10.1007/s11032-015-0248-y 2019
[4]

X., de Boer, M

https://doi.org/10.1007/s00122-023-04260-x Rodr´ ıguez-´Alvarez, M. X., de Boer, M. P., van Eeuwijk, F. A., & Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics,23, 52–71. https://doi.org/10.1016/j.spasta.2017.10.003 Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian ...

work page doi:10.1007/s00122-023-04260-x 2018
[5]

org/10.1186/s12711-024-00939-x Studnicki, M., & Piepho, H

https://doi. org/10.1186/s12711-024-00939-x Studnicki, M., & Piepho, H. P. (2024). Hierarchical modelling of variance components makes analysis of resolvable incomplete block designs more efficient.TAG. Theo- retical and Applied Genetics. Theoretische und angewandte Genetik,137(6),

work page doi:10.1186/s12711-024-00939-x 2024
[6]

J., Gaynor, R

https://doi.org/10.1007/s00122-024-04639-4 Tolhurst, D. J., Gaynor, R. C., Gardunia, B., Hickey, J. M., & Gorjanc, G. (2022). Genomic selection using random regressions on known and latent environmental covariates.Theoretical and Applied Genetics,135, 3393–3415. https://doi.org/10. 1007/s00122-022-04186-w Yan, Q., Fruzangohar, M., Taylor, J., Gong, D., Wa...

work page doi:10.1007/s00122-024-04639-4 2022
[7]

tr(SΣ−1 Z ) + MX g=1 b⊤ ˜k;gΣ−1 Z b˜k;g #! = det(ΣZ)−(ν+Z+M+1)/2 exp −1 2

https://doi.org/10.1186/ s13007-023-01073-3 Z´ u˜ niga, J. I. F., Arellano-Valle, R. B., & Ferrari, S. L. P. (2013). Mixed beta regression: A Bayesian perspective.Computational Statistics and Data Analysis,61, 137–147. https://doi.org/10.1016/j.csda.2012.12.002 18 6 Appendix We provided a Git webpage at GitHub 1 containing the BRRI MET data considered, ou...

work page doi:10.1016/j.csda.2012.12.002 2013
[8]

(2013) and its statistical models as specified by Prus and Piepho (2021) for equation (17) and by Prus and Piepho (2024) for equation (18)

The criteria Φ A(ξ) and Φ ⋆ A(ξ) are explicitly derived for the data structure provided by Kleinknecht et al. (2013) and its statistical models as specified by Prus and Piepho (2021) for equation (17) and by Prus and Piepho (2024) for equation (18). These criteria cannot be understood as general results usable for any MET data and need to be uniquely deri...

work page 2013

[1] [1]

Alvarez, I., Niemi, J., & Simpson, M. (2014). Bayesian inference for a covariance matrix. Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475- 7772.1004 Buntaran, H., Vasquez, A. M. B., Gordillo, A., Sahr, M., Wimmer, V., & Piepho, H. P. (2022). Assessing the response to genomic selection by simulation.Theoretical and Applied Gen...

work page doi:10.4148/2475- 2014

[2] [2]

A., da Silva, C

https://doi.org/10.2134/agronj2016.07.0395 de Oliveira, L. A., da Silva, C. P., Nuvunga, J. J., & Balestre, M. (2016). Bayesian GGE biplot models applied to maize multi-environment trials.Genetics and Molecular Research,15(2). https://doi.org/10.4238/gmr.15028612 Frey, J., Hartung, J., Ogutu, J. O., & Piepho, H. P. (2024). Analyze as randomized – why drop...

work page doi:10.2134/agronj2016.07.0395 2016

[3] [3]

J., da Silva, C

https://doi.org/10.1007/s11032-015-0248-y Nuvunga, J. J., da Silva, C. P., de Oliveira, L. A., de Lima, R. R., & Balestre, M. (2019). Bayesian factor analytic model: An approach in multiple environment trials.PLoS ONE,14(8), e0220290. https://doi.org/10.1371/journal.pone.0220290 Patterson, H. D. (1997). Analysis of series of variety trials. In R. A. Kempt...

work page doi:10.1007/s11032-015-0248-y 2019

[4] [4]

X., de Boer, M

https://doi.org/10.1007/s00122-023-04260-x Rodr´ ıguez-´Alvarez, M. X., de Boer, M. P., van Eeuwijk, F. A., & Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics,23, 52–71. https://doi.org/10.1016/j.spasta.2017.10.003 Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian ...

work page doi:10.1007/s00122-023-04260-x 2018

[5] [5]

org/10.1186/s12711-024-00939-x Studnicki, M., & Piepho, H

https://doi. org/10.1186/s12711-024-00939-x Studnicki, M., & Piepho, H. P. (2024). Hierarchical modelling of variance components makes analysis of resolvable incomplete block designs more efficient.TAG. Theo- retical and Applied Genetics. Theoretische und angewandte Genetik,137(6),

work page doi:10.1186/s12711-024-00939-x 2024

[6] [6]

J., Gaynor, R

https://doi.org/10.1007/s00122-024-04639-4 Tolhurst, D. J., Gaynor, R. C., Gardunia, B., Hickey, J. M., & Gorjanc, G. (2022). Genomic selection using random regressions on known and latent environmental covariates.Theoretical and Applied Genetics,135, 3393–3415. https://doi.org/10. 1007/s00122-022-04186-w Yan, Q., Fruzangohar, M., Taylor, J., Gong, D., Wa...

work page doi:10.1007/s00122-024-04639-4 2022

[7] [7]

tr(SΣ−1 Z ) + MX g=1 b⊤ ˜k;gΣ−1 Z b˜k;g #! = det(ΣZ)−(ν+Z+M+1)/2 exp −1 2

https://doi.org/10.1186/ s13007-023-01073-3 Z´ u˜ niga, J. I. F., Arellano-Valle, R. B., & Ferrari, S. L. P. (2013). Mixed beta regression: A Bayesian perspective.Computational Statistics and Data Analysis,61, 137–147. https://doi.org/10.1016/j.csda.2012.12.002 18 6 Appendix We provided a Git webpage at GitHub 1 containing the BRRI MET data considered, ou...

work page doi:10.1016/j.csda.2012.12.002 2013

[8] [8]

(2013) and its statistical models as specified by Prus and Piepho (2021) for equation (17) and by Prus and Piepho (2024) for equation (18)

The criteria Φ A(ξ) and Φ ⋆ A(ξ) are explicitly derived for the data structure provided by Kleinknecht et al. (2013) and its statistical models as specified by Prus and Piepho (2021) for equation (17) and by Prus and Piepho (2024) for equation (18). These criteria cannot be understood as general results usable for any MET data and need to be uniquely deri...

work page 2013