pith. sign in

arxiv: 2605.08701 · v1 · submitted 2026-05-09 · 💻 cs.LG · physics.ao-ph

METBRA25Y: Brazil Surface Meteorology Archive with Harmonized Variables and Quality Control

Pith reviewed 2026-05-12 00:59 UTC · model grok-4.3

classification 💻 cs.LG physics.ao-ph
keywords Brazilmeteorological dataINMETharmonized datasetquality controlhourly time seriessurface observationsclimatology
0
0 comments X

The pith

A harmonized archive supplies hourly meteorological observations from 616 Brazilian stations across 2000-2025 with standardized variables and quality flags.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces METBRA25Y, a processed collection of surface weather records drawn from public INMET annual files. It applies a workflow that parses station headers, converts Portuguese column labels to a common schema, builds consistent hourly timestamps, merges data by station, and attaches quality-control metadata. The result supplies precipitation, air temperature, dew point, humidity, pressure, wind parameters, and solar radiation, each accompanied by flags that mark implausible values or consistency issues. This structure is intended to let researchers in climatology, hydrology, agriculture, urban planning, and machine learning work with comparable time series without repeating the cleaning steps. The archive includes station manifests, daily precipitation sums, and missing-data reports to support transparent use.

Core claim

The processing pipeline successfully converts heterogeneous INMET annual archives into a single, compressed, station-organized collection of hourly observations spanning 2000 to 2025, complete with a canonical variable schema, two-stage quality-control flags, and supporting summary files that document coverage and data integrity for 616 stations.

What carries the argument

The two-stage quality-control protocol that first replaces physically implausible readings with missing values and then applies temporal and cross-variable consistency checks while preserving the original measurements.

Load-bearing premise

Public INMET annual files contain raw observations that can be read, timestamped, and normalized without creating systematic errors or discarding important information.

What would settle it

Independent verification that a substantial fraction of harmonized values deviate systematically from trusted reference measurements at the same stations and hours would show the archive does not deliver reliable data.

Figures

Figures reproduced from arXiv: 2605.08701 by Leopoldo Lusquino Filho, Matheus Lima Castro, William Dantas Vichete.

Figure 1
Figure 1. Figure 1: Reproducible construction workflow for METBRA25Y. The first stage ingests and harmonizes raw [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of post-filter missing or invalid hours by variable summary. Outliers are hidden in the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Station coordinates retained after applying a broad Brazil plausibility envelope. This figure is a [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Station-code coverage by variable-level summary file in the current release snapshot. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of station-level start and end years computed from the summary files after coordinate [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

This data paper describes METBRA25Y, a harmonized archive of hourly surface meteorological observations from Brazil derived from public historical records of the Instituto Nacional de Meteorologia (INMET). The dataset was designed to support reproducible environmental, climatological, hydrological, agricultural, urban-risk, and machine-learning studies that require station-level meteorological time series with standardized variable names and explicit quality-control metadata. The processing workflow ingests annual INMET archives, parses station metadata from raw file headers, normalizes heterogeneous Portuguese column names into a canonical schema, constructs hourly timestamps, consolidates observations by city and station, and exports compressed CSV files together with station manifests, per-station quality flags, daily precipitation aggregates, variable-level failure summaries, and missing-data audits. The quality-control protocol follows a two-stage strategy: first, physically implausible values are converted to missing values and flagged; second, temporal and cross-variable consistency checks generate diagnostic flags without necessarily overwriting the original measurements. The resulting package covers observations between 2000 and 2025, with stationspecific temporal coverage, and includes key meteorological variables such as precipitation, air temperature, dew point, relative humidity, atmospheric pressure, wind speed, wind gust, wind direction, and global solar radiation. Based on the summary files included in the current release snapshot, the archive contains 616 unique station codes across variable summaries, of which 605 have coordinates within a broad Brazil plausibility envelope. This paper documents the dataset provenance, file organization, harmonized schema, quality-control rules, technical validation outputs, limitations, and recommended usage practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript describes METBRA25Y, a harmonized archive of hourly surface meteorological observations from Brazil derived from public INMET records (2000-2025). It details a two-stage QC protocol (physically implausible values flagged as missing, followed by temporal/cross-variable consistency checks), variable normalization from Portuguese headers to a canonical schema, timestamp construction, station consolidation, and release of compressed CSVs plus manifests, per-station QC flags, daily precipitation aggregates, failure summaries, and missing-data audits. The archive covers 616 unique stations (605 with plausible coordinates) and variables including precipitation, air temperature, dew point, relative humidity, atmospheric pressure, wind speed/gust/direction, and global solar radiation.

Significance. If the described workflow is implemented as stated, the archive supplies a valuable, reproducible resource for climatological, hydrological, agricultural, urban-risk, and machine-learning studies requiring standardized Brazilian station-level time series. Explicit QC metadata and audits address common usability barriers in raw public meteorological archives, and the coordinate plausibility check plus station manifests provide basic validation.

minor comments (2)
  1. [Abstract] Abstract: the 'broad Brazil plausibility envelope' for coordinate validation is not defined; a brief description or reference to the exact bounding box or method used would improve transparency.
  2. The manuscript would benefit from a summary table (e.g., in §3 or §4) listing per-variable station counts, temporal coverage ranges, and missing-data percentages to support the headline statistics.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the METBRA25Y manuscript and for recommending acceptance. We are pleased that the description of the harmonized archive, the two-stage QC protocol, and the provided metadata were viewed as addressing common usability barriers in public meteorological data.

Circularity Check

0 steps flagged

No circularity: direct description of external data ingestion and harmonization

full rationale

The paper is a data release manuscript whose central claim is the existence and documentation of a harmonized archive produced by ingesting public INMET annual files, parsing headers, normalizing column names, constructing timestamps, applying two-stage QC (implausibility flagging followed by diagnostic consistency checks), and exporting CSVs with manifests and audits. No equations, fitted parameters, predictions, or uniqueness theorems appear. All quantitative statements (616 stations, 605 with plausible coordinates, 2000-2025 coverage, variable list) are summary statistics of the processed output rather than derived claims that reduce to self-referential definitions or self-citations. The workflow is presented as a transparent, externally verifiable sequence applied to an independent public source; the weakest assumption (raw INMET accuracy) is explicitly mitigated by per-station flags and missing-data audits rather than asserted. No load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that the source INMET records are reliable ground truth and that the described parsing and QC steps preserve the original information without distortion. No free parameters or new entities are introduced.

axioms (1)
  • domain assumption Public INMET historical archives constitute the authoritative source of Brazilian surface meteorological observations.
    The paper ingests directly from these records and treats them as the starting point for harmonization.

pith-pipeline@v0.9.0 · 5594 in / 1377 out tokens · 40453 ms · 2026-05-12T00:59:05.026427+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    BDMEP – Banco de Dados Meteorologicos para Ensino e Pesquisa,

    Instituto Nacional de Meteorologia (INMET), “BDMEP – Banco de Dados Meteorologicos para Ensino e Pesquisa,” 2026. [Online]. Available:https://bdmep.inmet.gov.br/. Accessed: 2026-05-02

  2. [2]

    Catalogo de Estacoes Automaticas,

    Instituto Nacional de Meteorologia (INMET), “Catalogo de Estacoes Automaticas,” 2026. [Online]. Avail- able:https://portal.inmet.gov.br/paginas/catalogoaut. Accessed: 2026-05-02

  3. [3]

    World Meteorological Organization,Guide to Climatological Practices, WMO-No. 100. Geneva, Switzer- land: World Meteorological Organization, 2011

  4. [4]

    World Meteorological Organization,Guide to Instruments and Methods of Observation (WMO-No. 8). [On- line]. Available:https://wmo.int/guide-instruments-and-methods-of-observation-wmo-no-8-0. Accessed: 2026-05-02

  5. [5]

    Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E

    M. D. Wilkinsonet al., “The FAIR Guiding Principles for scientific data management and stewardship,” Scientific Data, vol. 3, p. 160018, 2016, doi: 10.1038/sdata.2016.18

  6. [6]

    The Integrated Surface Database: Recent Developments and Part- nerships,

    A. Smith, N. Lott, and R. Vose, “The Integrated Surface Database: Recent Developments and Part- nerships,”Bulletin of the American Meteorological Society, vol. 92, no. 6, pp. 704–708, 2011, doi: 10.1175/2011BAMS3015.1

  7. [7]

    HadISD: a quality-controlled global synoptic report database for selected variables at long-termstationsfrom1973–2011,

    R. J. H. Dunnet al., “HadISD: a quality-controlled global synoptic report database for selected variables at long-termstationsfrom1973–2011,”Climate of the Past, vol.8, no.5, pp.1649–1679, 2012, doi: 10.5194/cp- 8-1649-2012

  8. [8]

    Expanding HadISD: quality-controlled, sub-daily station data from 1931,

    R. J. H. Dunn, K. M. Willett, D. E. Parker, and L. Mitchell, “Expanding HadISD: quality-controlled, sub-daily station data from 1931,”Geoscientific Instrumentation, Methods and Data Systems, vol. 5, pp. 473–491, 2016, doi: 10.5194/gi-5-473-2016

  9. [9]

    Comprehensive Automated Quality Assurance of Daily Surface Observations,

    I. Durre, M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose, “Comprehensive Automated Quality Assurance of Daily Surface Observations,”Journal of Applied Meteorology and Climatology, vol. 49, no. 8, pp. 1615–1633, 2010, doi: 10.1175/2010JAMC2375.1

  10. [10]

    An Overview of the Global Historical Climatology Network-Daily Database,

    M. J. Menne, I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston, “An Overview of the Global Historical Climatology Network-Daily Database,”Journal of Atmospheric and Oceanic Technology, vol. 29, no. 7, pp. 897–910, 2012, doi: 10.1175/JTECH-D-11-00103.1

  11. [11]

    Quality control of a global hourly rainfall dataset,

    E. Lewiset al., “Quality control of a global hourly rainfall dataset,”Environmental Modelling & Software, vol. 144, p. 105169, 2021, doi: 10.1016/j.envsoft.2021.105169

  12. [12]

    Quality-controlled meteorological datasets from SIGMA automatic weather sta- tions in northwest Greenland, 2012–2020,

    M. Nishimuraet al., “Quality-controlled meteorological datasets from SIGMA automatic weather sta- tions in northwest Greenland, 2012–2020,”Earth System Science Data, vol. 15, pp. 5207–5226, 2023, doi: 10.5194/essd-15-5207-2023

  13. [13]

    Daily gridded meteorological variables in Brazil (1980– 2013),

    A. C. Xavier, C. W. King, and B. R. Scanlon, “Daily gridded meteorological variables in Brazil (1980– 2013),”International Journal of Climatology, vol. 36, no. 6, pp. 2644–2659, 2016, doi: 10.1002/joc.4518

  14. [14]

    New improved Brazilian daily weather grid- ded data (1961–2020),

    A. C. Xavier, B. R. Scanlon, C. W. King, and A. I. Alves, “New improved Brazilian daily weather grid- ded data (1961–2020),”International Journal of Climatology, vol. 42, no. 16, pp. 8390–8404, 2022, doi: 10.1002/joc.7731

  15. [15]

    Gap Filling and Quality Control Applied to Meteorological Variables Measured in the Northeast Region of Brazil,

    R. L. Costaet al., “Gap Filling and Quality Control Applied to Meteorological Variables Measured in the Northeast Region of Brazil,”Atmosphere, vol. 12, no. 10, p. 1278, 2021, doi: 10.3390/atmos12101278

  16. [16]

    S¨ uzen, Supplement and dataset for bootstrapped time-integrated spread complexity, 10.5281/zen- odo.19707590 (2026)

    M. Lima Castro, L. Lusquino Filho, and W. Dantas Vichete, “METBRA25Y: Brazil Surface Meteorology Archive with Harmonized Variables and Quality Control,” Zenodo, version v1.0.0, 2026, doi: 10.5281/zen- odo.19964979. 12