JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

Carlo Sartiani; Dario Colazzo; Giorgio Ghelli; Mohamed-Amine Baazizi; Nour El Houda Ben Ali; Stefanie Scherzinger; Stefan Klessinger

arxiv: 2603.25306 · v2 · submitted 2026-03-26 · 💻 cs.DB

JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness

Mohamed-Amine Baazizi , Nour El Houda Ben Ali , Dario Colazzo , Giorgio Ghelli , Stefan Klessinger , Carlo Sartiani , Stefanie Scherzinger This is my paper

Pith reviewed 2026-05-15 00:55 UTC · model grok-4.3

classification 💻 cs.DB

keywords JSON Schemaschema inclusionrefutational normalizationwitness generationcompletenessefficiencyschema compatibility

0 comments

The pith

Refutational normalization reconciles efficiency and completeness for JSON Schema inclusion checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the problem of checking JSON Schema inclusion, that is, whether every document valid under one schema is also valid under another. Rule-based methods run quickly but miss some inclusions, while witness-generation methods are complete yet often too slow for practical use. The authors enrich witness generation with a specialized normalization step that refutes potential counterexamples early. This hybrid approach delivers both the speed of incomplete methods and the guarantees of complete ones. Experiments on real-world and synthetic schemas confirm large efficiency gains and enable previously intractable checks.

Core claim

By enriching witness-generation techniques with refutational normalization, the method achieves both the efficiency of rule-based procedures and the completeness of witness generation for determining JSON Schema inclusion.

What carries the argument

Refutational normalization, a specialized normalization process integrated into witness generation that refutes potential mismatches to retain completeness while improving runtime performance.

If this is right

Inclusion checks for schemas too complex for existing complete tools become feasible in practice.
Tasks such as API version compatibility, schema refactoring, and large-scale corpus analysis gain reliable automation.
Experiments show the method advances the state of the art by combining speed and completeness on both real and synthetic inputs.
The technique supports use cases previously limited by either incompleteness or excessive runtime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same normalization idea could transfer to inclusion problems in other schema languages such as XML Schema.
Adoption in schema-management tools would accelerate workflows involving data migration and validation.
Further tuning for common real-world schema patterns could yield additional speedups.

Load-bearing premise

Refutational normalization preserves the completeness of witness generation while producing substantial efficiency gains across real and synthetic schemas.

What would settle it

A pair of JSON schemas on which the new method either reports an incorrect inclusion result or fails to run substantially faster than prior complete witness-generation algorithms.

Figures

Figures reproduced from arXiv: 2603.25306 by Carlo Sartiani, Dario Colazzo, Giorgio Ghelli, Mohamed-Amine Baazizi, Nour El Houda Ben Ali, Stefanie Scherzinger, Stefan Klessinger.

**Figure 2.** Figure 2: Structural rules [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Grammar of canonical conjunctions. presented in Section 4.1 : a canonical conjunction𝐶 is either a typeset conjunction 𝐶𝑇𝑆 or a typed conjunction (𝐶𝑂, 𝐶𝐴, etc.) of stratified typed operators. C-references. The normalization process creates new references when it merges related operators. For example, assume we normalize the following (𝑆, 𝐸) document: ( all[ all[type(object), pProp( ⌊𝑎⌋ : ref(𝑥)) ], all[… view at source ↗

**Figure 4.** Figure 4: notPush(S). Function allCS(𝐶, 𝑆, 𝐸) works by cases on 𝑆. It just applies distributivity when 𝑆 = any[. . .] (lines 8 - 11) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Failure rates of tools across datasets (lower is bet [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

JSON Schema is the de facto standard for describing the structure of JSON documents. Reasoning about JSON Schema inclusion -- whether every instance satisfying a schema S1 also satisfies a schema S2 -- is a key building block for a variety of tasks, including version and API compatibility checks, schema refactoring tools, and large-scale schema corpus analysis. Existing approaches fall into two families: rule-based algorithms that are efficient but incomplete and witness generation-based algorithms that are complete but oftentimes extremely slow. This paper introduces a new approach that reconciles the efficiency of rule-based procedures with the completeness of the witness-generation technique, by enriching the latter with a specialized form of normalization. This refutational normalization paves the way for use-cases that are too hard for current tools. Our experiments with real-world and synthetic schemas show that the refutational normalization greatly advances the state-of-the-art in JSON Schema inclusion checking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Refutational normalization gives practical speed-ups for JSON Schema inclusion but the completeness argument needs explicit formal support.

read the letter

The main point is that the authors add a normalization pass before witness generation to cut down the search space for JSON Schema inclusion checks. This is positioned as a way to keep the completeness of witness methods while borrowing efficiency from rule-based ones, and the experiments on real-world plus synthetic schemas report clear runtime gains that matter for compatibility tools and corpus analysis. That practical side is the paper's clearest strength. The experiments appear to be the part that actually moves the needle for users who need faster checks without giving up soundness on the tested cases. The soft spot is exactly the one the stress-test flags: the claim that refutational normalization preserves the full set of counterexamples rests on the normalization rules not collapsing distinct instance sets, yet the abstract gives no reduction to a known complete procedure and no inductive argument over constructs like anyOf or additionalProperties. If those rules are not shown to be semantics-preserving in the full text, the completeness guarantee is unverified even if the runtimes look good. This work is aimed at people building schema tools in databases and API engineering rather than pure theorists. A reader who needs faster inclusion checks for refactoring or version compatibility would find the experimental results useful. It deserves a serious referee because the idea is concrete, the use-case is real, and the experiments are relevant, even though the formal side will probably need tightening in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces refutational normalization as a preprocessing step applied before witness generation for deciding JSON Schema inclusion (whether every instance of schema S1 satisfies S2). It claims this reconciles the efficiency of rule-based methods with the completeness of witness generation, with experiments on real-world and synthetic schemas showing substantial runtime improvements over prior approaches.

Significance. If the normalization rules preserve the exact set of counterexamples for non-inclusion, the technique would make complete inclusion checking practical for complex schemas, directly benefiting schema compatibility analysis, API versioning, and large-scale corpus studies where current tools are either incomplete or too slow.

major comments (2)

The central claim requires that refutational normalization preserves completeness, i.e., if S1 ⊈ S2 then the normalized pair still admits a witness of non-inclusion. No reduction to a known complete procedure, inductive argument over JSON Schema constructs (e.g., anyOf combined with additionalProperties), or semantic-preservation proof is supplied, leaving the completeness guarantee unestablished.
Experimental evaluation reports speed-ups but supplies no quantitative verification that completeness was retained (e.g., no count of cases where a witness was found post-normalization that would have been missed by rule-based methods alone, and no baseline comparison details on the witness-generation implementation used).

minor comments (2)

The abstract asserts 'greatly advances the state-of-the-art' without citing specific speed-up factors or schema sizes; adding one or two concrete numbers would strengthen the claim.
Normalization rules should be presented with explicit formal definitions (e.g., as rewrite rules with preconditions) rather than informal descriptions to support reproducibility and future proof attempts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We will revise the manuscript to supply the missing formal argument for completeness preservation and to strengthen the experimental section with quantitative completeness checks.

read point-by-point responses

Referee: The central claim requires that refutational normalization preserves completeness, i.e., if S1 ⊈ S2 then the normalized pair still admits a witness of non-inclusion. No reduction to a known complete procedure, inductive argument over JSON Schema constructs (e.g., anyOf combined with additionalProperties), or semantic-preservation proof is supplied, leaving the completeness guarantee unestablished.

Authors: We agree that the current manuscript lacks an explicit completeness argument. In the revision we will add a semantic-preservation proof by induction on JSON Schema constructs. The argument will show that each normalization rule preserves the existence of counterexample witnesses (when they exist) and will explicitly treat combinations such as anyOf with additionalProperties by demonstrating that no valid counterexample is eliminated. revision: yes
Referee: Experimental evaluation reports speed-ups but supplies no quantitative verification that completeness was retained (e.g., no count of cases where a witness was found post-normalization that would have been missed by rule-based methods alone, and no baseline comparison details on the witness-generation implementation used).

Authors: We acknowledge that the experiments do not yet report quantitative completeness verification. In the revised version we will add tables counting the additional witnesses discovered after normalization (i.e., cases where rule-based methods alone would have failed to detect non-inclusion) and will document the exact witness-generation implementation, its configuration parameters, and how it was used as the complete baseline. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The provided abstract and description introduce refutational normalization as a technique to reconcile efficiency and completeness for JSON Schema inclusion checking. No equations, self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations are visible. The central claim rests on experimental speed-ups and an asserted preservation property, but the text supplies no derivation that reduces the result to its own inputs by construction. This is a standard case of an independent algorithmic contribution without circular structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5476 in / 941 out tokens · 35863 ms · 2026-05-15T00:55:50.437160+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a notion of refutational normalization, an approach to normalization that internalizes the efficient behavior of the inclusion rules, that is optimized for a fast detection of situations where the normalized term is not satisfiable
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DNF(all[S1, not(S2)]) ... lazy normalization ... fast complement-absorption ... eager reference-evaluation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

JSON schema validator

2026. JSON schema validator. https://github.com/networknt/json-schema- validator Retrieved 12 February 2026

work page 2026
[2]

jsonschema

2026. jsonschema. https://github.com/python-jsonschema/jsonschema Re- trieved 12 February 2026

work page 2026
[3]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Stefan Klessinger, Carlo Sartiani, and Stefanie Scherzinger. 2026. Elimination of anno- tation dependencies in validation for Modern JSON Schema. Theor. Comput. Sci. 1063 (2026), 115645. https://doi.org/10.1016/J.TCS.2025.115645

work page doi:10.1016/j.tcs.2025.115645 2026
[4]

Lyes Attouche, Mohamed Amine Baazizi, Dario Colazzo, Yunchen Ding, Michael Fruth, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2021. A Test Suite for JSON Schema Containment. In Proc. ER 2021. 19–24

work page 2021
[5]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Ste- fan Klessinger, Carlo Sartiani, and Stefanie Scherzinger. 2022. Repro- duction Package. Available on Zenodo at https://doi.org/10.5281/ zenodo.7106750 and maintained on GitHub at https://github.com/sdbs-uni-p/ JSONSchemaWitnessGeneration

work page 2022
[6]

Lyes Attouche, Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2022. Witness Generation for JSON Schema. JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness Proc. VLDB Endow. 15, 13 (2022), 4002–4014. https://www.vldb.org/pvldb/vol15/ p4002-sartiani.pdf

work page 2022
[7]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2023. Reproduction Package for: Validation of Modern JSON Schema: Formalization and Complexity . https://doi.org/10.5281/ zenodo.10019663

work page 2023
[8]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2026. Witness Generation for Classical JSON Schema. ACM Trans. Database Syst. (Feb. 2026). https://doi.org/10.1145/3799416 Just Accepted

work page doi:10.1145/3799416 2026
[9]

McGuinness, Daniele Nardi, and Pe- ter F

Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Pe- ter F. Patel-Schneider (Eds.). 2003. The Description Logic Handbook: Theory, Im- plementation, and Applications. Cambridge University Press

work page 2003
[10]

Franz Baader and Ulrike Sattler. 2001. An Overview of Tableau Algorithms for Description Logics. Stud Logica 69, 1 (2001), 5–40. https://doi.org/10.1023/A: 1013882326814

work page doi:10.1023/a: 2001
[11]

Mohamed Amine Baazizi, Clément Berti, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2020. Human-in-the-Loop Schema Inference for Massive JSON Datasets. In EDBT. OpenProceedings.org, 635–638

work page 2020
[12]

Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2019. Parametric schema inference for massive JSON datasets. VLDB J. 28, 4 (2019), 497–521

work page 2019
[13]

Scalable Funding of Bitcoin Micropayment Channel Networks

Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Ste- fanie Scherzinger. 2021. An Empirical Study on the "Usage of Not" in Real-World JSON Schema Documents. In Proc. ER. 102–112. https://doi.org/10.1007/978-3- 030-89022-3_9

work page doi:10.1007/978-3- 2021
[14]

Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Ste- fanie Scherzinger. 2023. Negation-closure for JSON Schema. Theor. Comput. Sci. 955 (2023), 113823. https://doi.org/10.1016/j.tcs.2023.113823

work page doi:10.1016/j.tcs.2023.113823 2023
[15]

Fernando Suárez Barría. 2016. Formal Specification, Expressiveness, and Complex- ity Analysis for JSON Schema . Master’s thesis. Pontificia Universidad Católica de Chile, Santiago, Chile. https://repositorio.uc.cl/handle/11534/16908

work page 2016
[16]

Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, and Avraham Shin- nar. 2020. Lale: Consistent Automated Machine Learning. CoRR abs/2007.01977 (2020). arXiv:2007.01977 https://arxiv.org/abs/2007.01977

work page arXiv 2020
[17]

Filippo Bonchi and Damien Pous. 2013. Checking NFA equivalence with bisim- ulations up to congruence. In The 40th Annual ACM SIGPLAN-SIGACT Sympo- sium on Principles of Programming Languages, POPL ’13 . ACM, 457–468. https: //doi.org/10.1145/2429069.2429124

work page doi:10.1145/2429069.2429124 2013
[18]

Ahmed Bouajjani, Peter Habermehl, Lukás Holík, Tayssir Touili, and Tomás Vo- jnar. 2008. Antichain-Based Universality and Inclusion Testing over Nondeter- ministic Finite Tree Automata. In Implementation and Applications of Automata, 13th International Conference, CIAA 2008 (Lecture Notes in Computer Science) . Springer, 57–67. https://doi.org/10.1007/978...

work page doi:10.1007/978-3-540-70844-5_7 2008
[19]

Reutter, Fernando Suárez, and Domagoj Vrgoč

Pierre Bourhis, Juan L. Reutter, Fernando Suárez, and Domagoj Vrgoc. 2017. JSON: Data model, Query languages and Schema specification. In Proc. PODS . 123–135. https://doi.org/10.1145/3034786.3056120

work page doi:10.1145/3034786.3056120 2017
[20]

Giuseppe Castagna and Alain Frisch. 2005. A gentle introduction to semantic subtyping. In Proceedings of the 7th International ACM SIGPLAN Conference on Principles and Practice of Declarative Programming . ACM, 198–199. https://doi. org/10.1145/1069774.1069793

work page doi:10.1145/1069774.1069793 2005
[21]

Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen

Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen. 2024. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models. CoRR abs/2411.15100 (2024). https://doi. org/10.48550/ARXIV.2411.15100 arXiv:2411.15100

work page doi:10.48550/arxiv.2411.15100 2024
[22]

Alain Frisch. 2004. Théorie, conception et réalisation d’un langage de program- mation fonctionnel adapté à XML. Ph. D. Dissertation. Université Paris 7 – Denis Diderot

work page 2004
[23]

Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. 2002. Semantic Sub- typing. In 17th IEEE Symposium on Logic in Computer Science (LICS 2002), . 137–

work page 2002
[24]

https://doi.org/10.1109/LICS.2002.1029823

work page doi:10.1109/lics.2002.1029823 2002
[25]

Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. 2008. Semantic sub- typing: Dealing set-theoretically with function, union, intersection, and nega- tion types. J. ACM 55, 4 (2008), 19:1–19:64. https://doi.org/10.1145/1391289. 1391293

work page doi:10.1145/1391289 2008
[26]

Pierre Genevès, Nabil Layaïda, and Alan Schmitt. 2007. Efficient static analysis of XML paths and types. InProceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 342–351. https://doi.org/10.1145/1250734.1250773

work page doi:10.1145/1250734.1250773 2007
[27]

Li, and Konstantinos Mamouras

Alexis Le Glaunec, Angela W. Li, and Konstantinos Mamouras. 2026. Streaming Validation of JSON Documents against Schemas.Proc. VLDB Endow.19, 3 (March 2026), 509–522. https://doi.org/10.14778/3778092.3778109

work page doi:10.14778/3778092.3778109 2026
[28]

Andrew Habib, Avraham Shinnar, Martin Hirzel, and Michael Pradel. 2021. Find- ing Data Compatibility Bugs with JSON Subschema Checking. In Proc. ISSTA. 620–632. https://doi.org/10.1145/3460319.3464796

work page doi:10.1145/3460319.3464796 2021
[29]

Patel-Schneider

Ian Horrocks and Peter F. Patel-Schneider. 1999. Optimizing Description Logic Subsumption. J. Log. Comput. 9, 3 (1999), 267–293. https://doi.org/10.1093/ LOGCOM/9.3.267

work page 1999
[30]

Haruo Hosoya. 2001. Regular Expression Types for XML. Ph. D. Dissertation. The University of Tokyo

work page 2001
[31]

Haruo Hosoya and Makoto Murata. 2003. Boolean Operations for Attribute- Element Constraints. In Implementation and Application of Automata, 8th Inter- national Conference, CIAA 2003, Santa Barbara, California, USA, July 16-18, 2003, Proceedings (Lecture Notes in Computer Science) , Oscar H. Ibarra and Zhe Dang (Eds.). Springer, 201–212. https://doi.org/10...

work page doi:10.1007/3-540-45089-0_19 2003
[32]

Haruo Hosoya and Benjamin C. Pierce. 2001. Regular expression pattern match- ing for XML. In Conference Record of POPL 2001: The 28th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, London, UK, Jan- uary 17-19, 2001 , Chris Hankin and Dave Schmidt (Eds.). ACM, 67–80. https: //doi.org/10.1145/360204.360209

work page doi:10.1145/360204.360209 2001
[33]

Haruo Hosoya and Benjamin C. Pierce. 2003. XDuce: A statically typed XML processing language. ACM Trans. Internet Techn. 3, 2 (2003), 117–148. https: //doi.org/10.1145/767193.767195

work page doi:10.1145/767193.767195 2003
[34]

Haruo Hosoya, Jerome Vouillon, and Benjamin C. Pierce. 2005. Regular ex- pression types for XML. ACM Trans. Program. Lang. Syst. 27, 1 (2005), 46–90. https://doi.org/10.1145/1053468.1053470

work page doi:10.1145/1053468.1053470 2005
[35]

Stefan Klessinger, Michael Fruth, Valentin Gittinger, Meike Klettke, Uta Störl, and Stefanie Scherzinger. 2023. Tagger: A Tool for the Discovery of Tagged Unions in JSON Schema Extraction. InProceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31,

work page 2023
[36]

https://doi.org/10.48786/EDBT.2023.75

OpenProceedings.org, 827–830. https://doi.org/10.48786/EDBT.2023.75

work page doi:10.48786/edbt.2023.75 2023
[37]

Meike Klettke, Uta Störl, and Stefanie Scherzinger. 2015. Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores. In Proc. BTW (LNI, Vol. P-241). 425–444

work page 2015
[38]

Yu Liu, Duantengchuan Li, Kaili Wang, Zhuoran Xiong, Fobo Shi, Jian Wang, Bing Li, and Bo Hang. 2024. Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs. Inf. Process. Manag. 61, 5 (2024), 103809. https://doi.org/10.1016/J.IPM.2024.103809

work page doi:10.1016/j.ipm.2024.103809 2024
[39]

Reutter, Fernando Suárez, Martín Ugarte, and Domagoj Vrgoc

Felipe Pezoa, Juan L. Reutter, Fernando Suárez, Martín Ugarte, and Domagoj Vrgoc. 2016. Foundations of JSON Schema. In Proc. WWW. 263–273

work page 2016
[40]

William Spoth, Oliver Kennedy, Ying Lu, Beda Christoph Hammerschmidt, and Zhen Hua Liu. 2021. Reducing Ambiguity in Json Schema Discovery. InSIGMOD Conference. ACM, 1732–1744

work page 2021
[41]

Tadahiro Suda and Haruo Hosoya. 2005. Non-backtracking Top-Down Algo- rithm for Checking Tree Automata Containment. In Implementation and Appli- cation of Automata, 10th International Conference, CIAA 2005, Sophia Antipolis, France, June 27-29, 2005, Revised Selected Papers (Lecture Notes in Computer Sci- ence), Jacques Farré, Igor Litovsky, and Sylvain S...

work page doi:10.1007/11605157_25 2005
[42]

Henzinger, and Jean-François Raskin

Martin De Wulf, Laurent Doyen, Thomas A. Henzinger, and Jean-François Raskin. 2006. Antichains: A New Algorithm for Checking Universality of Fi- nite Automata. In Computer Aided Verification, 18th International Conference, CA V 2006, Seattle, W A, USA, August 17-20, 2006, Proceedings (Lecture Notes in Computer Science) , Thomas Ball and Robert B. Jones (E...

work page doi:10.1007/11817963_5 2006
[43]

Joohyung Yun, Byungchul Tak, and Wook-Shin Han. 2024. ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize Framework. Proc. VLDB Endow.17, 11 (July 2024), 3538–3550. https://doi.org/10.14778/3681954. 3682019

work page doi:10.14778/3681954 2024

[1] [1]

JSON schema validator

2026. JSON schema validator. https://github.com/networknt/json-schema- validator Retrieved 12 February 2026

work page 2026

[2] [2]

jsonschema

2026. jsonschema. https://github.com/python-jsonschema/jsonschema Re- trieved 12 February 2026

work page 2026

[3] [3]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Stefan Klessinger, Carlo Sartiani, and Stefanie Scherzinger. 2026. Elimination of anno- tation dependencies in validation for Modern JSON Schema. Theor. Comput. Sci. 1063 (2026), 115645. https://doi.org/10.1016/J.TCS.2025.115645

work page doi:10.1016/j.tcs.2025.115645 2026

[4] [4]

Lyes Attouche, Mohamed Amine Baazizi, Dario Colazzo, Yunchen Ding, Michael Fruth, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2021. A Test Suite for JSON Schema Containment. In Proc. ER 2021. 19–24

work page 2021

[5] [5]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Ste- fan Klessinger, Carlo Sartiani, and Stefanie Scherzinger. 2022. Repro- duction Package. Available on Zenodo at https://doi.org/10.5281/ zenodo.7106750 and maintained on GitHub at https://github.com/sdbs-uni-p/ JSONSchemaWitnessGeneration

work page 2022

[6] [6]

Lyes Attouche, Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2022. Witness Generation for JSON Schema. JSON Schema Inclusion through Refutational Normalization: Reconciling Efficiency and Completeness Proc. VLDB Endow. 15, 13 (2022), 4002–4014. https://www.vldb.org/pvldb/vol15/ p4002-sartiani.pdf

work page 2022

[7] [7]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2023. Reproduction Package for: Validation of Modern JSON Schema: Formalization and Complexity . https://doi.org/10.5281/ zenodo.10019663

work page 2023

[8] [8]

Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2026. Witness Generation for Classical JSON Schema. ACM Trans. Database Syst. (Feb. 2026). https://doi.org/10.1145/3799416 Just Accepted

work page doi:10.1145/3799416 2026

[9] [9]

McGuinness, Daniele Nardi, and Pe- ter F

Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Pe- ter F. Patel-Schneider (Eds.). 2003. The Description Logic Handbook: Theory, Im- plementation, and Applications. Cambridge University Press

work page 2003

[10] [10]

Franz Baader and Ulrike Sattler. 2001. An Overview of Tableau Algorithms for Description Logics. Stud Logica 69, 1 (2001), 5–40. https://doi.org/10.1023/A: 1013882326814

work page doi:10.1023/a: 2001

[11] [11]

Mohamed Amine Baazizi, Clément Berti, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2020. Human-in-the-Loop Schema Inference for Massive JSON Datasets. In EDBT. OpenProceedings.org, 635–638

work page 2020

[12] [12]

Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2019. Parametric schema inference for massive JSON datasets. VLDB J. 28, 4 (2019), 497–521

work page 2019

[13] [13]

Scalable Funding of Bitcoin Micropayment Channel Networks

Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Ste- fanie Scherzinger. 2021. An Empirical Study on the "Usage of Not" in Real-World JSON Schema Documents. In Proc. ER. 102–112. https://doi.org/10.1007/978-3- 030-89022-3_9

work page doi:10.1007/978-3- 2021

[14] [14]

Mohamed Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Ste- fanie Scherzinger. 2023. Negation-closure for JSON Schema. Theor. Comput. Sci. 955 (2023), 113823. https://doi.org/10.1016/j.tcs.2023.113823

work page doi:10.1016/j.tcs.2023.113823 2023

[15] [15]

Fernando Suárez Barría. 2016. Formal Specification, Expressiveness, and Complex- ity Analysis for JSON Schema . Master’s thesis. Pontificia Universidad Católica de Chile, Santiago, Chile. https://repositorio.uc.cl/handle/11534/16908

work page 2016

[16] [16]

Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, and Avraham Shin- nar. 2020. Lale: Consistent Automated Machine Learning. CoRR abs/2007.01977 (2020). arXiv:2007.01977 https://arxiv.org/abs/2007.01977

work page arXiv 2020

[17] [17]

Filippo Bonchi and Damien Pous. 2013. Checking NFA equivalence with bisim- ulations up to congruence. In The 40th Annual ACM SIGPLAN-SIGACT Sympo- sium on Principles of Programming Languages, POPL ’13 . ACM, 457–468. https: //doi.org/10.1145/2429069.2429124

work page doi:10.1145/2429069.2429124 2013

[18] [18]

Ahmed Bouajjani, Peter Habermehl, Lukás Holík, Tayssir Touili, and Tomás Vo- jnar. 2008. Antichain-Based Universality and Inclusion Testing over Nondeter- ministic Finite Tree Automata. In Implementation and Applications of Automata, 13th International Conference, CIAA 2008 (Lecture Notes in Computer Science) . Springer, 57–67. https://doi.org/10.1007/978...

work page doi:10.1007/978-3-540-70844-5_7 2008

[19] [19]

Reutter, Fernando Suárez, and Domagoj Vrgoč

Pierre Bourhis, Juan L. Reutter, Fernando Suárez, and Domagoj Vrgoc. 2017. JSON: Data model, Query languages and Schema specification. In Proc. PODS . 123–135. https://doi.org/10.1145/3034786.3056120

work page doi:10.1145/3034786.3056120 2017

[20] [20]

Giuseppe Castagna and Alain Frisch. 2005. A gentle introduction to semantic subtyping. In Proceedings of the 7th International ACM SIGPLAN Conference on Principles and Practice of Declarative Programming . ACM, 198–199. https://doi. org/10.1145/1069774.1069793

work page doi:10.1145/1069774.1069793 2005

[21] [21]

Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen

Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, and Tianqi Chen. 2024. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models. CoRR abs/2411.15100 (2024). https://doi. org/10.48550/ARXIV.2411.15100 arXiv:2411.15100

work page doi:10.48550/arxiv.2411.15100 2024

[22] [22]

Alain Frisch. 2004. Théorie, conception et réalisation d’un langage de program- mation fonctionnel adapté à XML. Ph. D. Dissertation. Université Paris 7 – Denis Diderot

work page 2004

[23] [23]

Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. 2002. Semantic Sub- typing. In 17th IEEE Symposium on Logic in Computer Science (LICS 2002), . 137–

work page 2002

[24] [24]

https://doi.org/10.1109/LICS.2002.1029823

work page doi:10.1109/lics.2002.1029823 2002

[25] [25]

Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. 2008. Semantic sub- typing: Dealing set-theoretically with function, union, intersection, and nega- tion types. J. ACM 55, 4 (2008), 19:1–19:64. https://doi.org/10.1145/1391289. 1391293

work page doi:10.1145/1391289 2008

[26] [26]

Pierre Genevès, Nabil Layaïda, and Alan Schmitt. 2007. Efficient static analysis of XML paths and types. InProceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, Jeanne Ferrante and Kathryn S. McKinley (Eds.). ACM, 342–351. https://doi.org/10.1145/1250734.1250773

work page doi:10.1145/1250734.1250773 2007

[27] [27]

Li, and Konstantinos Mamouras

Alexis Le Glaunec, Angela W. Li, and Konstantinos Mamouras. 2026. Streaming Validation of JSON Documents against Schemas.Proc. VLDB Endow.19, 3 (March 2026), 509–522. https://doi.org/10.14778/3778092.3778109

work page doi:10.14778/3778092.3778109 2026

[28] [28]

Andrew Habib, Avraham Shinnar, Martin Hirzel, and Michael Pradel. 2021. Find- ing Data Compatibility Bugs with JSON Subschema Checking. In Proc. ISSTA. 620–632. https://doi.org/10.1145/3460319.3464796

work page doi:10.1145/3460319.3464796 2021

[29] [29]

Patel-Schneider

Ian Horrocks and Peter F. Patel-Schneider. 1999. Optimizing Description Logic Subsumption. J. Log. Comput. 9, 3 (1999), 267–293. https://doi.org/10.1093/ LOGCOM/9.3.267

work page 1999

[30] [30]

Haruo Hosoya. 2001. Regular Expression Types for XML. Ph. D. Dissertation. The University of Tokyo

work page 2001

[31] [31]

Haruo Hosoya and Makoto Murata. 2003. Boolean Operations for Attribute- Element Constraints. In Implementation and Application of Automata, 8th Inter- national Conference, CIAA 2003, Santa Barbara, California, USA, July 16-18, 2003, Proceedings (Lecture Notes in Computer Science) , Oscar H. Ibarra and Zhe Dang (Eds.). Springer, 201–212. https://doi.org/10...

work page doi:10.1007/3-540-45089-0_19 2003

[32] [32]

Haruo Hosoya and Benjamin C. Pierce. 2001. Regular expression pattern match- ing for XML. In Conference Record of POPL 2001: The 28th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, London, UK, Jan- uary 17-19, 2001 , Chris Hankin and Dave Schmidt (Eds.). ACM, 67–80. https: //doi.org/10.1145/360204.360209

work page doi:10.1145/360204.360209 2001

[33] [33]

Haruo Hosoya and Benjamin C. Pierce. 2003. XDuce: A statically typed XML processing language. ACM Trans. Internet Techn. 3, 2 (2003), 117–148. https: //doi.org/10.1145/767193.767195

work page doi:10.1145/767193.767195 2003

[34] [34]

Haruo Hosoya, Jerome Vouillon, and Benjamin C. Pierce. 2005. Regular ex- pression types for XML. ACM Trans. Program. Lang. Syst. 27, 1 (2005), 46–90. https://doi.org/10.1145/1053468.1053470

work page doi:10.1145/1053468.1053470 2005

[35] [35]

Stefan Klessinger, Michael Fruth, Valentin Gittinger, Meike Klettke, Uta Störl, and Stefanie Scherzinger. 2023. Tagger: A Tool for the Discovery of Tagged Unions in JSON Schema Extraction. InProceedings 26th International Conference on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31,

work page 2023

[36] [36]

https://doi.org/10.48786/EDBT.2023.75

OpenProceedings.org, 827–830. https://doi.org/10.48786/EDBT.2023.75

work page doi:10.48786/edbt.2023.75 2023

[37] [37]

Meike Klettke, Uta Störl, and Stefanie Scherzinger. 2015. Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores. In Proc. BTW (LNI, Vol. P-241). 425–444

work page 2015

[38] [38]

Yu Liu, Duantengchuan Li, Kaili Wang, Zhuoran Xiong, Fobo Shi, Jian Wang, Bing Li, and Bo Hang. 2024. Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs. Inf. Process. Manag. 61, 5 (2024), 103809. https://doi.org/10.1016/J.IPM.2024.103809

work page doi:10.1016/j.ipm.2024.103809 2024

[39] [39]

Reutter, Fernando Suárez, Martín Ugarte, and Domagoj Vrgoc

Felipe Pezoa, Juan L. Reutter, Fernando Suárez, Martín Ugarte, and Domagoj Vrgoc. 2016. Foundations of JSON Schema. In Proc. WWW. 263–273

work page 2016

[40] [40]

William Spoth, Oliver Kennedy, Ying Lu, Beda Christoph Hammerschmidt, and Zhen Hua Liu. 2021. Reducing Ambiguity in Json Schema Discovery. InSIGMOD Conference. ACM, 1732–1744

work page 2021

[41] [41]

Tadahiro Suda and Haruo Hosoya. 2005. Non-backtracking Top-Down Algo- rithm for Checking Tree Automata Containment. In Implementation and Appli- cation of Automata, 10th International Conference, CIAA 2005, Sophia Antipolis, France, June 27-29, 2005, Revised Selected Papers (Lecture Notes in Computer Sci- ence), Jacques Farré, Igor Litovsky, and Sylvain S...

work page doi:10.1007/11605157_25 2005

[42] [42]

Henzinger, and Jean-François Raskin

Martin De Wulf, Laurent Doyen, Thomas A. Henzinger, and Jean-François Raskin. 2006. Antichains: A New Algorithm for Checking Universality of Fi- nite Automata. In Computer Aided Verification, 18th International Conference, CA V 2006, Seattle, W A, USA, August 17-20, 2006, Proceedings (Lecture Notes in Computer Science) , Thomas Ball and Robert B. Jones (E...

work page doi:10.1007/11817963_5 2006

[43] [43]

Joohyung Yun, Byungchul Tak, and Wook-Shin Han. 2024. ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize Framework. Proc. VLDB Endow.17, 11 (July 2024), 3538–3550. https://doi.org/10.14778/3681954. 3682019

work page doi:10.14778/3681954 2024