pith. sign in

arxiv: 2606.24891 · v1 · pith:HUFFKULXnew · submitted 2026-05-19 · 💻 cs.PL · cs.AI· cs.SE

Type Checking Project Haystack Grids using JSON Schema and Pydantic

Pith reviewed 2026-06-30 17:34 UTC · model grok-4.3

classification 💻 cs.PL cs.AIcs.SE
keywords Project HaystackPydanticJSON Schematype checkingbuilding ontologydata validationPython
0
0 comments X

The pith

A Python toolchain parses Haystack definitions and generates Pydantic models plus JSON Schemas for grid validation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a parser for Haystack definition files in Trio format together with a code generator that turns those files into Pydantic models and JSON Schema definitions. These artifacts support static type checking and structural validation of Haystack grids inside Python. The same JSON Schemas also permit validation of JSON data representations in any environment outside Python. The work targets the practical barriers created by custom file formats and limited automated checks in a widely used building ontology.

Core claim

Parsing Haystack definition files and generating Pydantic models and JSON Schema definitions from the parsed specifications enables static type checking and structural validation of Haystack grids within Python as well as schema-based validation of JSON representations outside the Python ecosystem.

What carries the argument

The code generator that derives Pydantic models and JSON Schema definitions from parsed Haystack Trio definition files.

Load-bearing premise

Haystack definitions can be translated into Pydantic models and JSON Schemas without introducing or missing semantic ambiguities from tag usage.

What would settle it

A concrete Haystack grid that the generated Pydantic model accepts yet violates an original Haystack rule, or that the model rejects yet satisfies the rule.

read the original abstract

Ontologies enable scalable energy services in buildings by supporting interoperability and automation. Project Haystack is a building ontology that is widely adopted due to its flexible, tag-based semantic model, openness, and extensibility, but suffers from ambiguous tag usage and limited automated validation. Although Project Haystack is formally open, its reliance on custom file formats and domain-specific languages that originate from the Haxall ecosystem creates a de facto barrier to integration. In this paper, we address these limitations by introducing a Python-based toolchain for Haystack. We present (i) a parser for Haystack definition files (Trio file format), and (ii) a code generator that derives Pydantic models and JSON Schema definitions from these parsed specifications. The resulting models enable static type checking and enable structural validation of Haystack grids within Python, as well as schema-based validation of JSON representations outside the Python ecosystem. All tools, generated models, and schemas are released publicly under an open-source license, with the goal of strengthening the Haystack ecosystem and opening a practical pathway beyond its current technical boundaries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a Python toolchain for Project Haystack consisting of (i) a parser for Trio definition files and (ii) a code generator producing Pydantic models and JSON Schemas. These artifacts are said to enable static type checking and structural validation of Haystack grids inside Python as well as schema-based JSON validation outside Python; all components are released publicly under an open-source license.

Significance. If the generated models and schemas faithfully capture Haystack semantics (including resolution of the acknowledged tag ambiguities), the work would provide a concrete, reusable bridge between the Haystack ecosystem and mainstream Python/JSON tooling, lowering integration barriers for building-energy applications. The explicit public release of parser, generator, models, and schemas is a verifiable strength that permits direct community inspection and reuse.

major comments (2)
  1. [Abstract and code-generator description] The manuscript states that the toolchain addresses ambiguous tag usage but supplies neither a description of the resolution strategy nor any concrete examples of how ambiguous tags are represented (or rejected) in the generated Pydantic models or JSON Schemas. This omission directly undermines the central claim that the artifacts enable reliable structural validation.
  2. [Contributions and evaluation] No evaluation section, test suite, or sample Haystack grid is shown to demonstrate that the generated models accept valid grids and reject invalid ones; the claim therefore rests solely on the existence of the released artifacts rather than on evidence presented in the paper.
minor comments (2)
  1. The Trio parser description would benefit from a short grammar fragment or example Trio snippet together with the corresponding generated Pydantic class.
  2. Consider adding a table that maps a representative set of Haystack tags to the resulting Pydantic field types and JSON Schema constraints.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We agree that the two major comments identify genuine gaps in the current manuscript. We will revise the paper to address both points directly.

read point-by-point responses
  1. Referee: [Abstract and code-generator description] The manuscript states that the toolchain addresses ambiguous tag usage but supplies neither a description of the resolution strategy nor any concrete examples of how ambiguous tags are represented (or rejected) in the generated Pydantic models or JSON Schemas. This omission directly undermines the central claim that the artifacts enable reliable structural validation.

    Authors: We agree that the manuscript currently mentions the handling of ambiguous tags without providing a concrete description of the resolution strategy or examples. In the revised version we will add a dedicated subsection under the code-generator description that (i) explains the disambiguation rules applied during parsing and model generation, (ii) shows the resulting Pydantic field definitions and JSON Schema constraints for representative ambiguous cases, and (iii) illustrates how invalid tag combinations are rejected at both the Pydantic and JSON-Schema levels. revision: yes

  2. Referee: [Contributions and evaluation] No evaluation section, test suite, or sample Haystack grid is shown to demonstrate that the generated models accept valid grids and reject invalid ones; the claim therefore rests solely on the existence of the released artifacts rather than on evidence presented in the paper.

    Authors: We acknowledge the lack of an evaluation section. The revised manuscript will include a new Evaluation section that (i) describes the test suite developed for the generated models and schemas, (ii) presents several sample Haystack grids (both valid and deliberately invalid), and (iii) reports the outcomes of validation runs demonstrating acceptance of correct grids and rejection of incorrect ones. The test artifacts will be made available in the public repository. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; implementation artifact only

full rationale

The paper introduces a parser for Haystack Trio files and a code generator producing Pydantic models plus JSON Schemas. No equations, predictions, fitted parameters, or first-principles derivations exist that could reduce to inputs by construction. The central claim is the release of publicly verifiable open-source artifacts whose correctness is independent of any self-citation chain and can be checked directly against Haystack grids. This matches the default expectation of no significant circularity for non-theoretical contributions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that Haystack's Trio definitions are sufficiently structured to support lossless parsing and model generation, plus the implicit claim that generated Pydantic/JSON Schema artifacts preserve the original semantics.

axioms (1)
  • domain assumption Haystack definition files in Trio format can be parsed without loss of semantic information needed for validation
    Parser component is presented as directly consuming these files to produce models.

pith-pipeline@v0.9.1-grok · 5724 in / 1072 out tokens · 35569 ms · 2026-06-30T17:34:34.759871+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 11 canonical work pages

  1. [1]

    Iot middleware platforms for smart energy systems: an empirical expert survey.Buildings, 12(5):526, 2022

    Qamar Alfalouji, Thomas Schranz, Alexander Kümpel, Markus Schraven, Thomas Storek, Stephan Gross, Antonello Monti, Dirk Müller, and Gerald Schweiger. Iot middleware platforms for smart energy systems: an empirical expert survey.Buildings, 12(5):526, 2022

  2. [2]

    Caroline Quinn and J. J. McArthur. A case study comparing the completeness and expressiveness of two industry recognized ontologies.Advanced Engineering Informatics, 47:101233, January 2021. doi:10.1016/j.aei. 2020.101233

  3. [3]

    A Comparison of the Brick Schema and Project Haystack, April 2021

    Erik Paulson. A Comparison of the Brick Schema and Project Haystack, April 2021

  4. [4]

    A systematic comparison and evaluation of building ontologies for deploying data-driven analytics in smart buildings.Energy and Buildings, 292:113054, August 2023

    Zhangcheng Qiang, Stuart Hands, Kerry Taylor, Subbu Sethuvenkatraman, Daniel Hugo, Pouya Ghiasnezhad Om- ran, Madhawa Perera, and Armin Haller. A systematic comparison and evaluation of building ontologies for deploying data-driven analytics in smart buildings.Energy and Buildings, 292:113054, August 2023. doi:10.1016/j.enbuild.2023.113054. 3 https://gith...

  5. [5]

    An overview of data tools for representing and managing building information and performance data.Renewable and Sustainable Energy Reviews, 147:111224, September 2021

    Na Luo, Marco Pritoni, and Tianzhen Hong. An overview of data tools for representing and managing building information and performance data.Renewable and Sustainable Energy Reviews, 147:111224, September 2021. doi:10.1016/j.rser.2021.111224

  6. [6]

    Zeng Peng, Thomas Ohlson Timoudas, and Qian Wang. Building ontologies for 4-5GDHC: A critical evaluation and modeling experiments on building-side components.Journal of Building Engineering, 114:114204, November 2025.doi:10.1016/j.jobe.2025.114204

  7. [7]

    Caroline Quinn and J. J. McArthur. Comparison of Brick and Project Haystack to Support Smart Building Applications, August 2022.arXiv:2205.05521,doi:10.48550/arXiv.2205.05521

  8. [8]

    Knowledge Graphs’ Ontologies and Applications for Energy Efficiency in Buildings: A Review.Energies, 15(20):7520, January 2022.doi:10.3390/en15207520

    Filippos Lygerakis, Nikos Kampelis, and Dionysia Kolokotsa. Knowledge Graphs’ Ontologies and Applications for Energy Efficiency in Buildings: A Review.Energies, 15(20):7520, January 2022.doi:10.3390/en15207520

  9. [9]

    Metadata Schemas and Ontologies for Building Energy Applications: A Critical Review and Use Case Analysis.Energies, 14(7):2024, January 2021.doi:10.3390/en14072024

    Marco Pritoni, Drew Paine, Gabriel Fierro, Cory Mosiman, Michael Poplawski, Avijit Saha, Joel Bender, and Jessica Granderson. Metadata Schemas and Ontologies for Building Energy Applications: A Critical Review and Use Case Analysis.Energies, 14(7):2024, January 2021.doi:10.3390/en14072024

  10. [10]

    Project haystack, 2026

    Project Haystack. Project haystack, 2026. URL:https://project-haystack.org/

  11. [11]

    Architecture, 2026

    Project Haystack. Architecture, 2026. URL:https://haxall.io/doc/docHaxall/Architecture

  12. [12]

    About – Project Haystack

    Project Haystack. About – Project Haystack. https://www.project-haystack.org/about, 2026. URL: https: //www.project-haystack.org/about

  13. [13]

    Semantic Interoperability to Enable Smart, Grid-Interactive Efficient Buildings

    Harry Bergmann, Cory Mosiman, Avijit Saha, Selam Haile, William Livingood, Steve Bushby, Gabe Fierro, Joel Bender, Michael Poplawski, Jessica Granderson, and Marco Pritoni. Semantic Interoperability to Enable Smart, Grid-Interactive Efficient Buildings. November 2020

  14. [14]

    Gupta, and David E

    Gabe Fierro, Jason Koh, Yuvraj Agarwal, Rajesh K. Gupta, and David E. Culler. Beyond a House of Sticks: Formalizing Metadata Tags with Brick. InProceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys ’19, pages 125–134, New York, NY , USA, November 2019. Association for Computing ...

  15. [15]

    Gupta, and David E

    Gabe Fierro, Jason Koh, Shreyas Nagare, Xiaolin Zang, Yuvraj Agarwal, Rajesh K. Gupta, and David E. Culler. Formalizing Tag-Based Metadata With the Brick Ontology.Frontiers in Built Environment, 6, September 2020. doi:10.3389/fbuil.2020.558034

  16. [16]

    MetamEnTh: An Object- Oriented Metamodel for IoT Systems in Buildings.IEEE Internet of Things Journal, 11(15):25818–25838, August 2024.doi:10.1109/JIOT.2024.3373330

    Peter Yefi, Ramanunni Parakkal Menon, Ursula Eicker, and Yann-Gaël Guéhéneuc. MetamEnTh: An Object- Oriented Metamodel for IoT Systems in Buildings.IEEE Internet of Things Journal, 11(15):25818–25838, August 2024.doi:10.1109/JIOT.2024.3373330

  17. [17]

    Short Paper: Analyzing Metadata Schemas for Buildings: The Good, the Bad, and the Ugly

    Arka Bhattacharya, Joern Ploennigs, and David Culler. Short Paper: Analyzing Metadata Schemas for Buildings: The Good, the Bad, and the Ugly. InProceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments, BuildSys ’15, pages 33–34, New York, NY , USA, November 2015. Association for Computing Machinery.do...

  18. [18]

    Haxall, 2026

    Haxall. Haxall, 2026. URL:https://haxall.io/

  19. [19]

    The Haxall Initiative - SkyFoundry Open Sources Core Software to Accelerate the BIoT, 2026

    SkyFoundry. The Haxall Initiative - SkyFoundry Open Sources Core Software to Accelerate the BIoT, 2026. URL:https://www.skyfoundry.com/blog/5924

  20. [20]

    Trio, 2026

    Project Haystack. Trio, 2026. URL:https://project-haystack.org/doc/docHaystack/Trio

  21. [21]

    Fantom, 2026

    Fantom. Fantom, 2026. URL:https://fantom.org/

  22. [22]

    Pydantic, 2026

    Pydantic. Pydantic, 2026. URL:https://docs.pydantic.dev/latest/

  23. [23]

    Haystack 4 defs

    Project Haystack. Haystack 4 defs. Project Haystack, December 2025. URL: https://github.com/ Project-Haystack/haystack-defs

  24. [24]

    Normalization, 2026

    Project Haystack. Normalization, 2026. URL: https://project-haystack.org/doc/docHaystack/ Normalization

  25. [25]

    Haxall/haxall, 2026

    Haxall. Haxall/haxall, 2026. URL:github.com/haxall/haxall

  26. [26]

    Json, 2026

    Project Haystack. Json, 2026. URL:https://project-haystack.org/doc/docHaystack/Json#v4

  27. [27]

    j2inn/hayson, 2026

    J2inn. j2inn/hayson, 2026. URL:https://github.com/j2inn/hayson

  28. [28]

    Points, 2026

    Project Haystack. Points, 2026. URL:https://project-haystack.org/doc/docHaystack/Points

  29. [29]

    Weather, 2026

    Project Haystack. Weather, 2026. URL:https://project-haystack.org/doc/docHaystack/Weather

  30. [30]

    Enum, 2026

    Project Haystack. Enum, 2026. URL:https://project-haystack.org/doc/lib-ph/enum. 7 Type Checking Project Haystack Grids using JSON Schema and Pydantic

  31. [31]

    Filters, 2026

    Project Haystack. Filters, 2026. URL: https://project-haystack.org/doc/docHaystack/Filters# refLists

  32. [32]

    Systems, 2026

    Project Haystack. Systems, 2026. URL:https://project-haystack.org/doc/docHaystack/Systems

  33. [33]

    Lark-parser/lark, 2026

    Lark. Lark-parser/lark, 2026. URL:https://github.com/lark-parser/lark

  34. [34]

    Validators, 2026

    Pydantic. Validators, 2026. URL:https://docs.pydantic.dev/latest/concepts/validators/

  35. [35]

    Shapes Constraint Language (SHACL), 2026

    W3C. Shapes Constraint Language (SHACL), 2026. URL:https://www.w3.org/TR/shacl/

  36. [36]

    Shacl: A description logic in disguise

    Bart Bogaerts, Maxime Jakubowski, and Jan Van den Bussche. Shacl: A description logic in disguise. In International Conference on Logic Programming and Nonmonotonic Reasoning, pages 75–88. Springer, 2022

  37. [37]

    Semantics and validation of recursive shacl

    Julien Corman, Juan L Reutter, and Ognjen Savkovi´c. Semantics and validation of recursive shacl. InInternational Semantic Web Conference, pages 318–336. Springer, 2018. 8