METBRA25Y: Brazil Surface Meteorology Archive with Harmonized Variables and Quality Control
Pith reviewed 2026-05-12 00:59 UTC · model grok-4.3
The pith
A harmonized archive supplies hourly meteorological observations from 616 Brazilian stations across 2000-2025 with standardized variables and quality flags.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The processing pipeline successfully converts heterogeneous INMET annual archives into a single, compressed, station-organized collection of hourly observations spanning 2000 to 2025, complete with a canonical variable schema, two-stage quality-control flags, and supporting summary files that document coverage and data integrity for 616 stations.
What carries the argument
The two-stage quality-control protocol that first replaces physically implausible readings with missing values and then applies temporal and cross-variable consistency checks while preserving the original measurements.
Load-bearing premise
Public INMET annual files contain raw observations that can be read, timestamped, and normalized without creating systematic errors or discarding important information.
What would settle it
Independent verification that a substantial fraction of harmonized values deviate systematically from trusted reference measurements at the same stations and hours would show the archive does not deliver reliable data.
Figures
read the original abstract
This data paper describes METBRA25Y, a harmonized archive of hourly surface meteorological observations from Brazil derived from public historical records of the Instituto Nacional de Meteorologia (INMET). The dataset was designed to support reproducible environmental, climatological, hydrological, agricultural, urban-risk, and machine-learning studies that require station-level meteorological time series with standardized variable names and explicit quality-control metadata. The processing workflow ingests annual INMET archives, parses station metadata from raw file headers, normalizes heterogeneous Portuguese column names into a canonical schema, constructs hourly timestamps, consolidates observations by city and station, and exports compressed CSV files together with station manifests, per-station quality flags, daily precipitation aggregates, variable-level failure summaries, and missing-data audits. The quality-control protocol follows a two-stage strategy: first, physically implausible values are converted to missing values and flagged; second, temporal and cross-variable consistency checks generate diagnostic flags without necessarily overwriting the original measurements. The resulting package covers observations between 2000 and 2025, with stationspecific temporal coverage, and includes key meteorological variables such as precipitation, air temperature, dew point, relative humidity, atmospheric pressure, wind speed, wind gust, wind direction, and global solar radiation. Based on the summary files included in the current release snapshot, the archive contains 616 unique station codes across variable summaries, of which 605 have coordinates within a broad Brazil plausibility envelope. This paper documents the dataset provenance, file organization, harmonized schema, quality-control rules, technical validation outputs, limitations, and recommended usage practices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes METBRA25Y, a harmonized archive of hourly surface meteorological observations from Brazil derived from public INMET records (2000-2025). It details a two-stage QC protocol (physically implausible values flagged as missing, followed by temporal/cross-variable consistency checks), variable normalization from Portuguese headers to a canonical schema, timestamp construction, station consolidation, and release of compressed CSVs plus manifests, per-station QC flags, daily precipitation aggregates, failure summaries, and missing-data audits. The archive covers 616 unique stations (605 with plausible coordinates) and variables including precipitation, air temperature, dew point, relative humidity, atmospheric pressure, wind speed/gust/direction, and global solar radiation.
Significance. If the described workflow is implemented as stated, the archive supplies a valuable, reproducible resource for climatological, hydrological, agricultural, urban-risk, and machine-learning studies requiring standardized Brazilian station-level time series. Explicit QC metadata and audits address common usability barriers in raw public meteorological archives, and the coordinate plausibility check plus station manifests provide basic validation.
minor comments (2)
- [Abstract] Abstract: the 'broad Brazil plausibility envelope' for coordinate validation is not defined; a brief description or reference to the exact bounding box or method used would improve transparency.
- The manuscript would benefit from a summary table (e.g., in §3 or §4) listing per-variable station counts, temporal coverage ranges, and missing-data percentages to support the headline statistics.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the METBRA25Y manuscript and for recommending acceptance. We are pleased that the description of the harmonized archive, the two-stage QC protocol, and the provided metadata were viewed as addressing common usability barriers in public meteorological data.
Circularity Check
No circularity: direct description of external data ingestion and harmonization
full rationale
The paper is a data release manuscript whose central claim is the existence and documentation of a harmonized archive produced by ingesting public INMET annual files, parsing headers, normalizing column names, constructing timestamps, applying two-stage QC (implausibility flagging followed by diagnostic consistency checks), and exporting CSVs with manifests and audits. No equations, fitted parameters, predictions, or uniqueness theorems appear. All quantitative statements (616 stations, 605 with plausible coordinates, 2000-2025 coverage, variable list) are summary statistics of the processed output rather than derived claims that reduce to self-referential definitions or self-citations. The workflow is presented as a transparent, externally verifiable sequence applied to an independent public source; the weakest assumption (raw INMET accuracy) is explicitly mitigated by per-station flags and missing-data audits rather than asserted. No load-bearing step reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Public INMET historical archives constitute the authoritative source of Brazilian surface meteorological observations.
Reference graph
Works this paper leans on
-
[1]
BDMEP – Banco de Dados Meteorologicos para Ensino e Pesquisa,
Instituto Nacional de Meteorologia (INMET), “BDMEP – Banco de Dados Meteorologicos para Ensino e Pesquisa,” 2026. [Online]. Available:https://bdmep.inmet.gov.br/. Accessed: 2026-05-02
work page 2026
-
[2]
Catalogo de Estacoes Automaticas,
Instituto Nacional de Meteorologia (INMET), “Catalogo de Estacoes Automaticas,” 2026. [Online]. Avail- able:https://portal.inmet.gov.br/paginas/catalogoaut. Accessed: 2026-05-02
work page 2026
-
[3]
World Meteorological Organization,Guide to Climatological Practices, WMO-No. 100. Geneva, Switzer- land: World Meteorological Organization, 2011
work page 2011
-
[4]
World Meteorological Organization,Guide to Instruments and Methods of Observation (WMO-No. 8). [On- line]. Available:https://wmo.int/guide-instruments-and-methods-of-observation-wmo-no-8-0. Accessed: 2026-05-02
work page 2026
-
[5]
M. D. Wilkinsonet al., “The FAIR Guiding Principles for scientific data management and stewardship,” Scientific Data, vol. 3, p. 160018, 2016, doi: 10.1038/sdata.2016.18
-
[6]
The Integrated Surface Database: Recent Developments and Part- nerships,
A. Smith, N. Lott, and R. Vose, “The Integrated Surface Database: Recent Developments and Part- nerships,”Bulletin of the American Meteorological Society, vol. 92, no. 6, pp. 704–708, 2011, doi: 10.1175/2011BAMS3015.1
-
[7]
R. J. H. Dunnet al., “HadISD: a quality-controlled global synoptic report database for selected variables at long-termstationsfrom1973–2011,”Climate of the Past, vol.8, no.5, pp.1649–1679, 2012, doi: 10.5194/cp- 8-1649-2012
work page doi:10.5194/cp- 2011
-
[8]
Expanding HadISD: quality-controlled, sub-daily station data from 1931,
R. J. H. Dunn, K. M. Willett, D. E. Parker, and L. Mitchell, “Expanding HadISD: quality-controlled, sub-daily station data from 1931,”Geoscientific Instrumentation, Methods and Data Systems, vol. 5, pp. 473–491, 2016, doi: 10.5194/gi-5-473-2016
-
[9]
Comprehensive Automated Quality Assurance of Daily Surface Observations,
I. Durre, M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose, “Comprehensive Automated Quality Assurance of Daily Surface Observations,”Journal of Applied Meteorology and Climatology, vol. 49, no. 8, pp. 1615–1633, 2010, doi: 10.1175/2010JAMC2375.1
-
[10]
An Overview of the Global Historical Climatology Network-Daily Database,
M. J. Menne, I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston, “An Overview of the Global Historical Climatology Network-Daily Database,”Journal of Atmospheric and Oceanic Technology, vol. 29, no. 7, pp. 897–910, 2012, doi: 10.1175/JTECH-D-11-00103.1
-
[11]
Quality control of a global hourly rainfall dataset,
E. Lewiset al., “Quality control of a global hourly rainfall dataset,”Environmental Modelling & Software, vol. 144, p. 105169, 2021, doi: 10.1016/j.envsoft.2021.105169
-
[12]
M. Nishimuraet al., “Quality-controlled meteorological datasets from SIGMA automatic weather sta- tions in northwest Greenland, 2012–2020,”Earth System Science Data, vol. 15, pp. 5207–5226, 2023, doi: 10.5194/essd-15-5207-2023
-
[13]
Daily gridded meteorological variables in Brazil (1980– 2013),
A. C. Xavier, C. W. King, and B. R. Scanlon, “Daily gridded meteorological variables in Brazil (1980– 2013),”International Journal of Climatology, vol. 36, no. 6, pp. 2644–2659, 2016, doi: 10.1002/joc.4518
-
[14]
New improved Brazilian daily weather grid- ded data (1961–2020),
A. C. Xavier, B. R. Scanlon, C. W. King, and A. I. Alves, “New improved Brazilian daily weather grid- ded data (1961–2020),”International Journal of Climatology, vol. 42, no. 16, pp. 8390–8404, 2022, doi: 10.1002/joc.7731
-
[15]
R. L. Costaet al., “Gap Filling and Quality Control Applied to Meteorological Variables Measured in the Northeast Region of Brazil,”Atmosphere, vol. 12, no. 10, p. 1278, 2021, doi: 10.3390/atmos12101278
-
[16]
M. Lima Castro, L. Lusquino Filho, and W. Dantas Vichete, “METBRA25Y: Brazil Surface Meteorology Archive with Harmonized Variables and Quality Control,” Zenodo, version v1.0.0, 2026, doi: 10.5281/zen- odo.19964979. 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.