On Statistical Properties of A Veracity Scoring Method for Spatial Data

Arnab Chakraborty; Soumendra N. Lahiri

arxiv: 1906.08843 · v1 · pith:6EBYWVMLnew · submitted 2019-06-20 · 📊 stat.ME · math.ST· stat.AP· stat.TH

On Statistical Properties of A Veracity Scoring Method for Spatial Data

Arnab Chakraborty , Soumendra N. Lahiri This is my paper

Pith reviewed 2026-05-25 19:02 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.APstat.TH

keywords veracity scoringspatial regressionconsistencyasymptotic mean squared errornon-stationary noisegeostatistical dataordinary least squares

0 comments

The pith

Veracity scoring from local summaries yields consistent regression estimators that beat ordinary least squares under non-stationary noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a veracity scoring technique for geostatistical data when no reference measurements are available. Scores are computed from local summaries of the observations and then used to weight a spatial regression estimator. Under non-stationary noise and standard assumptions on the spatial process, the resulting estimators are consistent. Their asymptotic mean squared errors are shown to be smaller than those of ordinary least squares. The claims are checked in simulations and on coal-ash percentage data from Pennsylvania seams.

Core claim

Under a non-stationary noise structure and fairly general assumptions on the underlying spatial process, the VS-based estimators of the regression parameters are consistent. The paper further establishes the advantage of these estimators over ordinary least squares by direct comparison of their asymptotic mean squared errors.

What carries the argument

Veracity scores computed from local summaries of the observations and inserted into a weighted least-squares estimator for the spatial regression parameters.

If this is right

The VS-based estimators remain consistent for the regression parameters.
The asymptotic mean squared error of each VS-based estimator is smaller than that of the corresponding ordinary least squares estimator.
The method applies directly to geostatistical regression without external reference data.
The same weighting improves finite-sample performance in the reported simulations and coal-seam example.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If mobile-sensor networks routinely exhibit non-stationary noise, the local-summary scores could replace reference-based weighting in many environmental mapping tasks.
The approach invites direct comparison with other robust spatial estimators that also down-weight suspect observations.
Extensions to non-Gaussian or temporally evolving processes would test whether the consistency proof generalizes beyond the current setting.

Load-bearing premise

Local summaries can be used to define veracity scores that correctly capture reliability when the noise is non-stationary.

What would settle it

A simulation under the paper's stated assumptions in which the VS-based estimators exhibit larger asymptotic mean squared error than ordinary least squares would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 1906.08843 by Arnab Chakraborty, Soumendra N. Lahiri.

**Figure 2.** Figure 2: VS-based smoothing of residuals: histogram of observed residuals from VS-based [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 3.** Figure 3: Variogram estimation Next, we compare the VS-based analysis with the robust REML approach. To implement the robust REML methodology on the coal ash data the R-package georob (Papritz 2018a) is used. We use leave-one-out cross-validation technique to compare the two approaches: for each of the observations in the coal ash data, we consider it as the test data and try to predict (kriging) it using all other … view at source ↗

**Figure 4.** Figure 4: Prediction comparison between VS and robust-REML: (a) - empirical c.d.f. of [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

read the original abstract

Measuring veracity or reliability of noisy data is of utmost importance, especially in the scenarios where the information are gathered through automated systems. In a recent paper, Chakraborty et. al. (2019) have introduced a veracity scoring technique for geostatistical data. The authors have used a high-quality `reference' data to measure the veracity of the varying-quality observations and incorporated the veracity scores in their analysis of mobile-sensor generated noisy weather data to generate efficient predictions of the ambient temperature process. In this paper, we consider the scenario when no reference data is available and hence, the veracity scores (referred as VS) are defined based on `local' summaries of the observations. We develop a VS-based estimation method for parameters of a spatial regression model. Under a non-stationary noise structure and fairly general assumptions on the underlying spatial process, we show that the VS-based estimators of the regression parameters are consistent. Moreover, we establish the advantage of the VS-based estimators as compared to the ordinary least squares (OLS) estimator by analyzing their asymptotic mean squared errors. We illustrate the merits of the VS-based technique through simulations and apply the methodology to a real data set on mass percentages of ash in coal seams in Pennsylvania.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends the authors' prior veracity scoring to the no-reference case for spatial regression, proves consistency of the resulting estimators, and shows asymptotic MSE gains over OLS under non-stationary noise.

read the letter

The core contribution is the adaptation of veracity scoring to settings with no reference data. The authors define scores from local summaries of the observations, plug them into spatial regression estimation, and establish consistency plus lower asymptotic mean squared error than ordinary least squares when noise is non-stationary. This is a straightforward extension of their 2019 work rather than a broad new framework, but the no-reference version plus the formal comparison is what is actually new here. They also run simulations and apply the method to coal ash percentage data from Pennsylvania, which gives the claims some practical grounding. The real-data example is useful because it shows the technique on a genuine spatial dataset where reference measurements are unavailable. The approach targets exactly the sensor-based problems the abstract describes. The main soft spot is the reliance on local summaries to produce weights that stay uncorrelated with the regression errors and the spatial signal. The stress-test note flags a real issue: if the local windows overlap covariate variation or the correlation length of the process, the scores could induce bias even under non-stationary noise. The abstract invokes fairly general assumptions without spelling out the separation condition, so the proofs need to make that step explicit and check whether it holds under realistic window sizes. If the paper only assumes it away rather than deriving it, that would be the load-bearing point for referees to examine. This paper is for spatial statisticians who work with automated or mobile-sensor data and need a way to down-weight noisy observations without clean reference values. A reader already familiar with geostatistical regression and asymptotic arguments will get the most out of the consistency and MSE results. It deserves a serious referee because it supplies theorems, a direct comparison to OLS, simulations, and an application; the claims are concrete enough that reviewers can verify the conditions and the separation argument rather than just accepting the abstract.

Referee Report

1 major / 1 minor

Summary. The manuscript develops a veracity scoring (VS) technique for spatial regression when no reference data is available, defining VS from local summaries of the observations. It claims to establish consistency of the resulting VS-based estimators for regression parameters under a non-stationary noise structure and fairly general assumptions on the underlying spatial process, to demonstrate an asymptotic MSE advantage relative to OLS, and to illustrate the approach via simulations and an application to ash percentages in Pennsylvania coal seams.

Significance. If the consistency result and the asymptotic MSE comparison hold under the stated conditions, the work supplies a practical reference-free weighting scheme for noisy geostatistical data that can improve efficiency over OLS when noise is non-stationary. The explicit asymptotic comparison is a methodological strength when the derivations are complete and the separation between local summaries and the mean process is verified.

major comments (1)

[Abstract] Abstract (consistency claim): the stated consistency of VS-based estimators requires that veracity scores computed from local summaries remain asymptotically uncorrelated with the regression errors and covariates. The abstract invokes a non-stationary noise structure and 'fairly general assumptions' but supplies no explicit condition ensuring that the local window used for each score does not overlap with covariate variation or the spatial correlation length; without such a condition the weights can induce asymptotic bias even while noise remains non-stationary, which is load-bearing for the central claim.

minor comments (1)

The phrase 'fairly general assumptions' should be replaced by an enumerated list of the precise conditions used in the consistency theorem so that readers can verify applicability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. Below we respond point-by-point to the single major comment.

read point-by-point responses

Referee: [Abstract] Abstract (consistency claim): the stated consistency of VS-based estimators requires that veracity scores computed from local summaries remain asymptotically uncorrelated with the regression errors and covariates. The abstract invokes a non-stationary noise structure and 'fairly general assumptions' but supplies no explicit condition ensuring that the local window used for each score does not overlap with covariate variation or the spatial correlation length; without such a condition the weights can induce asymptotic bias even while noise remains non-stationary, which is load-bearing for the central claim.

Authors: The manuscript derives consistency under a set of assumptions on the spatial process (detailed in the theoretical sections) that explicitly require the local windows for computing veracity scores to be of fixed or slowly growing size relative to both the covariate smoothness scale and the spatial correlation length. These conditions ensure the required asymptotic uncorrelation between the scores and the regression errors/covariates. We nevertheless agree that the abstract would benefit from greater specificity on this point and will revise it to reference the localization conditions on the windows. revision: yes

Circularity Check

0 steps flagged

No significant circularity; consistency theorem relies on stated assumptions rather than reducing to input definitions

full rationale

The paper defines veracity scores from local summaries of observations (when no reference data exists) and presents consistency of the VS-based regression estimators as a theorem under explicit assumptions of non-stationary noise structure plus general conditions on the spatial process. This does not reduce by construction to the definition of the scores themselves, nor does it rename a fitted quantity as a prediction. The citation to Chakraborty et al. (2019) introduces the reference-data version of the method and is not load-bearing for the no-reference extension or the consistency result here. No self-citation chain, ansatz smuggling, or uniqueness theorem imported from the authors' prior work is used to force the central claims. The derivation is therefore self-contained against the stated assumptions and external to any fitted inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities. The method implicitly relies on the definition of local-summary veracity scores and the non-stationary noise model, but these are not itemized.

pith-pipeline@v0.9.0 · 5756 in / 1162 out tokens · 20531 ms · 2026-05-25T19:02:02.994973+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

V(si) = exp(−|Z(si)−C(Zi)|/(α+D(Zi))); ˆβvs = (X′DvX)−1 X′DvZ; consistency via extended Ghosh-Bahadur under α-mixing and mixed-increasing domain asymptotics (C.1–C.13)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MSE(VS) bounded by C3(n−1(C1(qe))2 + C2λ−4n) independent of τi2; OLS lower bound contains ∑τi2 terms

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Chakraborty, A., Lahiri, S. N. and Wilson, A. (2019) A statistical analysis of noisy crowdsourced weather data. ://arxiv.org/abs/1902.06183. Submitted to Annals of Applied Statistics

work page arXiv 2019
[4]

J., Rubin, V

Conroy, N. J., Rubin, V. L. and Chen, Y. (2015) Veracity roadmap: Is big data objective, truthful and credible? 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community

work page 2015
[5]

(1993) Statistics for spatial data

Cressie, N. (1993) Statistics for spatial data. Wiley series in probability and mathematical statistics,. John Wiley & Sons, Inc

work page 1993
[6]

and Douglas, H

Cressie, N. and Douglas, H. M. (1980) Robust estimation of the variogram: I. Journal of the International Association for Mathematical Geology, 12, 115--125

work page 1980
[7]

J., Menezes, R

Diggle, P. J., Menezes, R. and Su, T.-l. (2010) Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59, 191--232

work page 2010
[8]

Evans, B. J. (1997) Dynamic display of spatial data-reliability: Does it benefit the map user? Computers & Geoscience, 23, 409--422

work page 1997
[9]

Gelfand, A. E. ., Diggle, P. J. ., Fuentes, M. and Guttorp, P. (2010) Handbook of spatial statistics. Chapman & Hall/CRC Handbooks of Modern Statistical Methods,. CRC Press

work page 2010
[10]

Ghosh, J. K. (1971) A new proof of the bahadur representation of quantiles and an application. Annals of Mathematical Statistics, 42, 1957--1961

work page 1971
[11]

and Hazen, K

Gomez, M. and Hazen, K. (1970) Evaluating sulfur and ash distribution in coal seams by statistical response surface regression analysis. Tech. rep., Bureau of Mines, Denver, Colo.(USA)

work page 1970
[12]

and Patil, P

Hall, P. and Patil, P. (1994) Properties of nonparametric estimators of autocovariance for stationary random fields. Probability Theory Related Fields, 99, 399--424

work page 1994
[13]

Haskard, K. A. (2007) An anisotropic Mat\'ern spatial covariance model: REML estimation and properties. Ph.D. thesis, University of Adelaide

work page 2007
[14]

Huber, P. J. and Ronchetti, E. M. (2009) Robust statistics. Wiley sereis in probability and statistics,. John Wiley & Sons, Inc

work page 2009
[15]

R., Papritz, A., Schwierz, C

K\" u nsch, H. R., Papritz, A., Schwierz, C. and Stahel, A. W. (2011) Robust estimation of the external drift and the variogram of spatial data. ISI 58 ^ th World Statistics Congress of the International Statistical Institute , Aug 21--26

work page 2011
[16]

N., Kaiser, M

Lahiri, S. N., Kaiser, M. S., Cressie, N. and Hsu, N.-J. (1999) Prediction of spatial cumulative distribution functions using subsampling. Journal of the American Statistical Association, 94, 86--97

work page 1999
[17]

N., Lee, Y

Lahiri, S. N., Lee, Y. and Cressie, N. (2002) On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters. Journal of Statistical Planning and Inference, 102, 65--85

work page 2002
[18]

and Rubin, V

Lukoianovaand, T. and Rubin, V. L. (2014) Veracity roadmap: Is big data objective, truthful and credible? Advances In Classification Research Online. 10.7152/acro.v24i1.14671

work page doi:10.7152/acro.v24i1.14671 2014
[19]

(2018 a ) georob: Robust geostatistical analysis of spatial data,

Papritz, A. (2018 a ) georob: Robust geostatistical analysis of spatial data,. ://CRAN.R-project.org/package=georob. R package version 0.3-7

work page 2018
[20]

https://cran.r-project.org/web/packages/georob/vignettes/georob_vignette.pdf

--- (2018 b ) Tutorial and manual for geostatistical analyses with the r package georob. https://cran.r-project.org/web/packages/georob/vignettes/georob_vignette.pdf. Accessed: 2019-02-12

work page 2018
[21]

Pebesma, E. J. (2004) Multivariable geostatistics in s: the gstat package. Computers & Geosciences, 30, 683--691

work page 2004
[22]

and Stegall, J

Rendon, H., Wilson, A. and Stegall, J. (2018) Is it ‘fake news’? intelligence community expertise and news dissemination as measurements for media reliability. Intelligence and National Security, 33

work page 2018
[23]

Sen, P. K. (1968) Asymptotic normality of sample quantiles of m -dependent processes. Annals of Mathematical Statistics, 39, 1724--1730

work page 1968

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := ...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Chakraborty, A., Lahiri, S. N. and Wilson, A. (2019) A statistical analysis of noisy crowdsourced weather data. ://arxiv.org/abs/1902.06183. Submitted to Annals of Applied Statistics

work page arXiv 2019

[4] [4]

J., Rubin, V

Conroy, N. J., Rubin, V. L. and Chen, Y. (2015) Veracity roadmap: Is big data objective, truthful and credible? 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community

work page 2015

[5] [5]

(1993) Statistics for spatial data

Cressie, N. (1993) Statistics for spatial data. Wiley series in probability and mathematical statistics,. John Wiley & Sons, Inc

work page 1993

[6] [6]

and Douglas, H

Cressie, N. and Douglas, H. M. (1980) Robust estimation of the variogram: I. Journal of the International Association for Mathematical Geology, 12, 115--125

work page 1980

[7] [7]

J., Menezes, R

Diggle, P. J., Menezes, R. and Su, T.-l. (2010) Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59, 191--232

work page 2010

[8] [8]

Evans, B. J. (1997) Dynamic display of spatial data-reliability: Does it benefit the map user? Computers & Geoscience, 23, 409--422

work page 1997

[9] [9]

Gelfand, A. E. ., Diggle, P. J. ., Fuentes, M. and Guttorp, P. (2010) Handbook of spatial statistics. Chapman & Hall/CRC Handbooks of Modern Statistical Methods,. CRC Press

work page 2010

[10] [10]

Ghosh, J. K. (1971) A new proof of the bahadur representation of quantiles and an application. Annals of Mathematical Statistics, 42, 1957--1961

work page 1971

[11] [11]

and Hazen, K

Gomez, M. and Hazen, K. (1970) Evaluating sulfur and ash distribution in coal seams by statistical response surface regression analysis. Tech. rep., Bureau of Mines, Denver, Colo.(USA)

work page 1970

[12] [12]

and Patil, P

Hall, P. and Patil, P. (1994) Properties of nonparametric estimators of autocovariance for stationary random fields. Probability Theory Related Fields, 99, 399--424

work page 1994

[13] [13]

Haskard, K. A. (2007) An anisotropic Mat\'ern spatial covariance model: REML estimation and properties. Ph.D. thesis, University of Adelaide

work page 2007

[14] [14]

Huber, P. J. and Ronchetti, E. M. (2009) Robust statistics. Wiley sereis in probability and statistics,. John Wiley & Sons, Inc

work page 2009

[15] [15]

R., Papritz, A., Schwierz, C

K\" u nsch, H. R., Papritz, A., Schwierz, C. and Stahel, A. W. (2011) Robust estimation of the external drift and the variogram of spatial data. ISI 58 ^ th World Statistics Congress of the International Statistical Institute , Aug 21--26

work page 2011

[16] [16]

N., Kaiser, M

Lahiri, S. N., Kaiser, M. S., Cressie, N. and Hsu, N.-J. (1999) Prediction of spatial cumulative distribution functions using subsampling. Journal of the American Statistical Association, 94, 86--97

work page 1999

[17] [17]

N., Lee, Y

Lahiri, S. N., Lee, Y. and Cressie, N. (2002) On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters. Journal of Statistical Planning and Inference, 102, 65--85

work page 2002

[18] [18]

and Rubin, V

Lukoianovaand, T. and Rubin, V. L. (2014) Veracity roadmap: Is big data objective, truthful and credible? Advances In Classification Research Online. 10.7152/acro.v24i1.14671

work page doi:10.7152/acro.v24i1.14671 2014

[19] [19]

(2018 a ) georob: Robust geostatistical analysis of spatial data,

Papritz, A. (2018 a ) georob: Robust geostatistical analysis of spatial data,. ://CRAN.R-project.org/package=georob. R package version 0.3-7

work page 2018

[20] [20]

https://cran.r-project.org/web/packages/georob/vignettes/georob_vignette.pdf

--- (2018 b ) Tutorial and manual for geostatistical analyses with the r package georob. https://cran.r-project.org/web/packages/georob/vignettes/georob_vignette.pdf. Accessed: 2019-02-12

work page 2018

[21] [21]

Pebesma, E. J. (2004) Multivariable geostatistics in s: the gstat package. Computers & Geosciences, 30, 683--691

work page 2004

[22] [22]

and Stegall, J

Rendon, H., Wilson, A. and Stegall, J. (2018) Is it ‘fake news’? intelligence community expertise and news dissemination as measurements for media reliability. Intelligence and National Security, 33

work page 2018

[23] [23]

Sen, P. K. (1968) Asymptotic normality of sample quantiles of m -dependent processes. Annals of Mathematical Statistics, 39, 1724--1730

work page 1968