Uncertainty in the MAN Data Calibration & Trend Estimates

Jaap Hanekamp; William M. Briggs

arxiv: 1907.10173 · v1 · pith:Q25MHKRCnew · submitted 2019-07-23 · 📊 stat.AP · physics.ao-ph

Uncertainty in the MAN Data Calibration & Trend Estimates

William M. Briggs , Jaap Hanekamp This is my paper

Pith reviewed 2026-05-24 16:44 UTC · model grok-4.3

classification 📊 stat.AP physics.ao-ph

keywords atmospheric ammoniatrend identificationcalibration uncertaintyMAN dataLML datadata imputationenvironmental monitoringmeasurement error propagation

0 comments

The pith

Calibration of MAN ammonia data to LML introduces uncertainty that halves the number of detected trends when propagated.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the MAN atmospheric ammonia measurements are calibrated against the LML series, yet this step injects error that prior trend analyses ignored entirely. When the uncertainty is carried forward into trend calculations, roughly half the previously reported trends disappear. Missing values at MAN stations are also filled in by imputation, which again shifts the count and direction of trends. A reader would care because these trends are used to judge changes in air quality and farm emissions, and overstated certainty could distort regulatory conclusions. The LML series itself shows roughly equal numbers of upward, downward, and flat trends whose sign flips with the choice of start year.

Core claim

The calibration step that aligns MAN stations to the LML reference series adds measurement uncertainty never before propagated into trend tests. When this uncertainty is included, the number of statistically significant trends in the MAN record falls by about half. Filling in missing observations at the MAN sites further alters the tally, producing more positive trends and fewer significant ones. The sign and significance of trends therefore depend on whether calibration error and imputation are acknowledged. The LML series alone already contains mixed positive, negative, and null trends whose detection changes with the start date chosen for analysis.

What carries the argument

Propagation of calibration uncertainty from the MAN-to-LML regression into subsequent trend tests, together with imputation of missing MAN values.

If this is right

Trends identified in the MAN data change sign and lose significance once calibration uncertainty is included.
Imputation of missing MAN observations increases the number of positive trends while decreasing the number judged significant.
The choice of start date for any trend calculation in the LML series can reverse the direction or significance of the result.
Current published counts of ammonia trends rest on an incomplete error budget.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar unpropagated calibration steps may exist in other national air-quality networks and could be checked with the same propagation technique.
Regulatory reports that cite trend counts without calibration uncertainty should be revisited to show wider confidence intervals.
Repeating the analysis on future data releases would test whether the 50 percent reduction persists as the record lengthens.

Load-bearing premise

The LML measurements can be treated as an error-free reference against which only the MAN calibration error needs to be added.

What would settle it

An independent side-by-side comparison that shows the LML instrument error is comparable in size to the calibration adjustment would remove the justification for treating LML as the fixed standard.

Figures

Figures reproduced from arXiv: 1907.10173 by Jaap Hanekamp, William M. Briggs.

**Figure 2.** Figure 2: The calibration exercise for 15 September 2011. The open circles [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: The original and calibrated MAN data from Fig. 2. The open [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The 90% calibration predictive intervals for MAN data from Fig. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: This shows the calibrated MAN data against the original LML [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: The monthly data of each LML station, with a regression trend [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: This graph shows how varying the start data of the trend analysis [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Yearly mean values of NH3 with trends computed as in Fig. 6. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Monthly trend coefficients were estimated from the raw, cali [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: This is similar to and clarifies Fig. 9, except that this is a station [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: The number of positive (black) and negative (red) significant [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: The number of significant trends identified in the simple and [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

read the original abstract

We investigate trend identification in the LML and MAN atmospheric ammonia data. The signals are mixed in the LML data, with just as many positive, negative, and no trends found. The start date for trend identification is crucial, with the trends claimed changing sign and significance depending on the start date. The MAN data is calibrated to the LML data. This calibration introduces uncertainty never heretofore accounted for in any downstream analysis, such as identifying trends. We introduce a method to do this, and find that the number of trends identified in the MAN data drop by about 50%. The missing data at MAN stations is also imputed; we show that this imputation again changes the number of trends identified, with more positive and fewer significant trends claimed. The sign and significance of the trends identified in the MAN data change with the introduction of the calibration and then again with the imputation. The conclusion is that great over-certainty exists in current methods of trend identification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper examines trend detection in LML and MAN atmospheric ammonia measurements. It reports mixed positive/negative/no-trend signals in the LML series whose sign and significance depend on start date. MAN data are calibrated to LML; the authors introduce a method to propagate the resulting calibration uncertainty (previously ignored) and state that the number of detected trends falls by ~50 %. Imputation of missing MAN values is also shown to alter trend counts and significance. The conclusion is that existing trend analyses are over-certain.

Significance. If the calibration-uncertainty propagation is shown to be correctly formulated and the 50 % reduction is robust to reasonable variations in the LML error model, the work would demonstrate that calibration steps can materially affect downstream trend counts in atmospheric monitoring networks and would motivate routine inclusion of such uncertainties in future analyses.

major comments (2)

[Abstract] Abstract (calibration paragraph): the central numerical claim—a 50 % drop in identified trends—is presented without any equations, description of the uncertainty distribution, or propagation procedure, so the result cannot be verified or reproduced from the given text.
[Abstract] Abstract (calibration paragraph): the method treats LML observations as an error-free reference standard; if LML measurements carry their own error or temporal variability, the effective calibration variance is smaller than assumed and the reported attenuation of trend counts would be overstated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed report and the opportunity to respond. The two major comments both concern the abstract's treatment of the calibration-uncertainty results. We address each below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract (calibration paragraph): the central numerical claim—a 50 % drop in identified trends—is presented without any equations, description of the uncertainty distribution, or propagation procedure, so the result cannot be verified or reproduced from the given text.

Authors: We agree that the abstract is too terse to allow verification of the 50 % figure. The calibration uncertainty is modeled as additive Gaussian noise whose variance is estimated from the residuals of the MAN-LML regression; this noise is then propagated by Monte Carlo resampling of the MAN series before trend fitting. The resulting distribution of trend counts is what yields the reported ~50 % reduction. We will revise the abstract to include a one-sentence description of the uncertainty model and the Monte Carlo procedure so that the numerical claim can be understood without consulting the methods section. revision: yes
Referee: [Abstract] Abstract (calibration paragraph): the method treats LML observations as an error-free reference standard; if LML measurements carry their own error or temporal variability, the effective calibration variance is smaller than assumed and the reported attenuation of trend counts would be overstated.

Authors: The LML series is used as the reference because it is the higher-precision, co-located instrument against which the MAN calibration factors are derived; this is the standard practice in the network. We nevertheless accept that any unaccounted LML measurement error would reduce the effective calibration variance and could therefore overstate the attenuation of trend counts. We will add an explicit statement of this modeling assumption in the revised abstract and methods, together with a short sensitivity discussion in the text that explores the effect of modest LML error on the reported 50 % reduction. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation applies external calibration uncertainty to trend counts

full rationale

The paper introduces a propagation method for MAN-LML calibration discrepancy and reports its effect on trend counts as an empirical result. No load-bearing step reduces by definition to a fitted parameter, self-citation chain, or ansatz smuggled from prior work by the same authors. The central claim (roughly 50% fewer trends) is obtained by applying the new procedure to the data rather than being forced tautologically from the inputs themselves. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities; the calibration step is treated as introducing new uncertainty without stating the functional form or distributional assumptions used to propagate it.

pith-pipeline@v0.9.0 · 5688 in / 1101 out tokens · 16245 ms · 2026-05-24T16:44:07.586379+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Amrhein, S

V. Amrhein, S. Greenland, and B. McShane. Scientists rise up against statistical signiﬁcance. Nature, 567:305–307, 2019. 29

work page 2019
[2]

Amrhein, F

V. Amrhein, F. Korner-Nievergelt, and T. Roth. The earth is ﬂat (p < 0.05): signiﬁcance thresholds and the crisis of unreplicable research. PeerJ, 5:e3544, 2017

work page 2017
[3]

Benjamin, J

D. Benjamin, J. Berger, M. Johannesson, B. Nosek, E. Wagenmakers, R. Berk, and et al. Redeﬁne statistical signiﬁcance. Nat. Hum. Behav. , 2:6–10, 2018

work page 2018
[4]

J. O. Berger and T. Selke. Testing a point null hypothesis: the irrecon- cilability of p-values and evidence. JASA, 33:112–122, 1987

work page 1987
[5]

W. M. Briggs. Uncertainty: The Soul of Probability, Modeling & Statis- tics. Springer, New York, 2016

work page 2016
[6]

W. M. Briggs. Everything wrong with p-values under one roof. In V. Kreinovich, N. Thach, N. Trung, and D. Thanh, editors, Beyond Traditional Probabilistic Methods in Economics, pages 22–44. Springer, New York, 2019

work page 2019
[7]

R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Mea- surement Error in Nonlinear Models: A Modern Perspective . Chapman and Hall, London, 2006

work page 2006
[8]

S. Geisser. Predictive Inference: An Introduction . Chapman & Hall, New York, 1993

work page 1993
[9]

N. M. Gibbs and S. V. Gibbs. Misuse of ‘trend’ to describe ‘almost signif- icant’ diﬀerences in anaesthesia research.British Journal of Anaesthesia, 115(3):337–339, 2004

work page 2004
[10]

Gigerenzer

G. Gigerenzer. Mindless statistics. The Journal of Socio-Economics , 33:587–606, 2004

work page 2004
[11]

F. E. Harrell. Regression Modeling Strategies. Springer, New York, 2001

work page 2001
[12]

Hubbard and R

R. Hubbard and R. M. Lindsay. Why p values are not a useful measure of evidence in statistical signiﬁcance testing. Theory & Psychology, 18:69– 88, 2008

work page 2008
[13]

W. O. Johnson and S. Geisser. A predictive view of the detection and characterization of inﬂuence observations in regression analysis. JASA, 78:427–440, 1982. 30

work page 1982
[14]

D. E. Lolkema, H. Noordijk, A. P. Stolk, R. Hoogerbrugge, M. C. van Zanten, and W. A. J. van Pul. The measuring ammonia in nature (MAN) network in the Netherlands. Biogeosciences, 12:5133–5142, 2015

work page 2015
[15]

B. B. McShane, D. Gal, A. Gelman, C. Robert, and J. L. Tackett. Aban- don statistical signiﬁcance. The American Statistician, page Forthcom- ing, 2018

work page 2018
[16]

Stolk, H

A. Stolk, H. Noordijk, H. den Hollander, M. van Zanten, R. W. Kruit, and W. van Pul. Het verloop van de ammoniakconcentratie over 2005-

work page 2005
[17]

Technical Report RIVM Rapport 2016-0136, Rijksinstituut voor Volksgezondheid en Milieu, Postbus 1, 3720 BA Bilthoven, Nederland, 2016

work page 2016
[18]

Sutton, U

M. Sutton, U. Dragosits, C. Geels, S. Gyldenkaerne, T. Misselbrook, and W. Bussink. Review on the scientiﬁc underpinning of calculation of ammonia emission and deposition in the netherlands. Technical report, Rijksoverheid, Nederlands, 2015

work page 2015
[19]

Traﬁmow, V

D. Traﬁmow, V. Amrhein, C. N. Areshenkoﬀ, C. J. Barrera-Causil, E. J. Beh, Y. K. Bilgi, R. Bono, M. T. Bradley, W. M. Briggs, H. A. Cepeda- Freyre, S. E. Chaigneau, D. R. Ciocca, J. C. Correa, D. Cousineau, M. R. de Boer, S. S. Dhar, I. Dolgov, J. Gmez-Benito, M. Grendar, J. W. Grice, M. E. Guerrero-Gimenez, A. Gutirrez, T. B. Huedo-Medina, K. Jaﬀe, A. Ja...

work page 2018
[20]

van Zanten, R

M. van Zanten, R. W. Kruit, R. Hoogerbrugge, E. V. der Swaluw, and W. van Pul. Trends in ammonia measurements in the netherlands over the period 1993–2014. Atmospheric Environment, 148:352–360, 2017

work page 1993
[21]

R. L. Wasserstein. The ASA’s statement on p-values: Context, process, and purpose. American Statistician, 70:129–132, 2016. 31

work page 2016
[22]

Wichink-Kruit, R

R. Wichink-Kruit, R. Hoogerbrugge, F. Sauter, W. de Vries, and W. van Pul. Ontwikkelingen in emissies en concentraties van ammoniak in ned- erland tussen 2005 en 2016. Technical Report RIVM Rapport 2018-0163, Rijksinstituut voor Volksgezondheid en Milieu, Bithoven, Nederland, 2018

work page 2005
[23]

S. T. Ziliak and D. N. McCloskey. The Cult of Statistical Signiﬁcance . University of Michigan Press, Ann Arbor, 2008. 32

work page 2008

[1] [1]

Amrhein, S

V. Amrhein, S. Greenland, and B. McShane. Scientists rise up against statistical signiﬁcance. Nature, 567:305–307, 2019. 29

work page 2019

[2] [2]

Amrhein, F

V. Amrhein, F. Korner-Nievergelt, and T. Roth. The earth is ﬂat (p < 0.05): signiﬁcance thresholds and the crisis of unreplicable research. PeerJ, 5:e3544, 2017

work page 2017

[3] [3]

Benjamin, J

D. Benjamin, J. Berger, M. Johannesson, B. Nosek, E. Wagenmakers, R. Berk, and et al. Redeﬁne statistical signiﬁcance. Nat. Hum. Behav. , 2:6–10, 2018

work page 2018

[4] [4]

J. O. Berger and T. Selke. Testing a point null hypothesis: the irrecon- cilability of p-values and evidence. JASA, 33:112–122, 1987

work page 1987

[5] [5]

W. M. Briggs. Uncertainty: The Soul of Probability, Modeling & Statis- tics. Springer, New York, 2016

work page 2016

[6] [6]

W. M. Briggs. Everything wrong with p-values under one roof. In V. Kreinovich, N. Thach, N. Trung, and D. Thanh, editors, Beyond Traditional Probabilistic Methods in Economics, pages 22–44. Springer, New York, 2019

work page 2019

[7] [7]

R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Mea- surement Error in Nonlinear Models: A Modern Perspective . Chapman and Hall, London, 2006

work page 2006

[8] [8]

S. Geisser. Predictive Inference: An Introduction . Chapman & Hall, New York, 1993

work page 1993

[9] [9]

N. M. Gibbs and S. V. Gibbs. Misuse of ‘trend’ to describe ‘almost signif- icant’ diﬀerences in anaesthesia research.British Journal of Anaesthesia, 115(3):337–339, 2004

work page 2004

[10] [10]

Gigerenzer

G. Gigerenzer. Mindless statistics. The Journal of Socio-Economics , 33:587–606, 2004

work page 2004

[11] [11]

F. E. Harrell. Regression Modeling Strategies. Springer, New York, 2001

work page 2001

[12] [12]

Hubbard and R

R. Hubbard and R. M. Lindsay. Why p values are not a useful measure of evidence in statistical signiﬁcance testing. Theory & Psychology, 18:69– 88, 2008

work page 2008

[13] [13]

W. O. Johnson and S. Geisser. A predictive view of the detection and characterization of inﬂuence observations in regression analysis. JASA, 78:427–440, 1982. 30

work page 1982

[14] [14]

D. E. Lolkema, H. Noordijk, A. P. Stolk, R. Hoogerbrugge, M. C. van Zanten, and W. A. J. van Pul. The measuring ammonia in nature (MAN) network in the Netherlands. Biogeosciences, 12:5133–5142, 2015

work page 2015

[15] [15]

B. B. McShane, D. Gal, A. Gelman, C. Robert, and J. L. Tackett. Aban- don statistical signiﬁcance. The American Statistician, page Forthcom- ing, 2018

work page 2018

[16] [16]

Stolk, H

A. Stolk, H. Noordijk, H. den Hollander, M. van Zanten, R. W. Kruit, and W. van Pul. Het verloop van de ammoniakconcentratie over 2005-

work page 2005

[17] [17]

Technical Report RIVM Rapport 2016-0136, Rijksinstituut voor Volksgezondheid en Milieu, Postbus 1, 3720 BA Bilthoven, Nederland, 2016

work page 2016

[18] [18]

Sutton, U

M. Sutton, U. Dragosits, C. Geels, S. Gyldenkaerne, T. Misselbrook, and W. Bussink. Review on the scientiﬁc underpinning of calculation of ammonia emission and deposition in the netherlands. Technical report, Rijksoverheid, Nederlands, 2015

work page 2015

[19] [19]

Traﬁmow, V

D. Traﬁmow, V. Amrhein, C. N. Areshenkoﬀ, C. J. Barrera-Causil, E. J. Beh, Y. K. Bilgi, R. Bono, M. T. Bradley, W. M. Briggs, H. A. Cepeda- Freyre, S. E. Chaigneau, D. R. Ciocca, J. C. Correa, D. Cousineau, M. R. de Boer, S. S. Dhar, I. Dolgov, J. Gmez-Benito, M. Grendar, J. W. Grice, M. E. Guerrero-Gimenez, A. Gutirrez, T. B. Huedo-Medina, K. Jaﬀe, A. Ja...

work page 2018

[20] [20]

van Zanten, R

M. van Zanten, R. W. Kruit, R. Hoogerbrugge, E. V. der Swaluw, and W. van Pul. Trends in ammonia measurements in the netherlands over the period 1993–2014. Atmospheric Environment, 148:352–360, 2017

work page 1993

[21] [21]

R. L. Wasserstein. The ASA’s statement on p-values: Context, process, and purpose. American Statistician, 70:129–132, 2016. 31

work page 2016

[22] [22]

Wichink-Kruit, R

R. Wichink-Kruit, R. Hoogerbrugge, F. Sauter, W. de Vries, and W. van Pul. Ontwikkelingen in emissies en concentraties van ammoniak in ned- erland tussen 2005 en 2016. Technical Report RIVM Rapport 2018-0163, Rijksinstituut voor Volksgezondheid en Milieu, Bithoven, Nederland, 2018

work page 2005

[23] [23]

S. T. Ziliak and D. N. McCloskey. The Cult of Statistical Signiﬁcance . University of Michigan Press, Ann Arbor, 2008. 32

work page 2008