pith. machine review for the scientific record. sign in

arxiv: 2604.09237 · v1 · submitted 2026-04-10 · 💻 cs.CL

ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery

Pith reviewed 2026-05-10 17:43 UTC · model grok-4.3

classification 💻 cs.CL
keywords schema discoveryinteractive extractionlarge language modelsstructured datanatural language questionsdocument collectionsdomain expertscomputational biology
0
0 comments X

The pith

ScheMatiQ converts natural-language research questions over document collections into schemas and structured databases that support real-world analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Many disciplines require structured evidence from large document collections to answer natural-language research questions, but the traditional route of manual schema design and exhaustive labeling is slow and error-prone. ScheMatiQ uses calls to a backbone LLM to generate an initial schema and grounded database directly from the question and corpus. A web interface lets users steer and revise the extraction process interactively. Collaborations with domain experts show that the resulting outputs enable genuine analysis tasks in law and computational biology. The system is released as open source with a public interface to allow experts in other fields to apply it to their own data.

Core claim

ScheMatiQ leverages calls to a backbone LLM to take a question and a corpus to produce a schema and a grounded database, with a web interface that lets users steer and revise the extraction, and in collaboration with domain experts this yields outputs that support real-world analysis in law and computational biology.

What carries the argument

Interactive schema discovery process that combines LLM-generated schemas and extractions with user steering through a web interface to produce a grounded database.

Load-bearing premise

LLM-generated schemas and extractions, even after interactive user steering, produce sufficiently accurate and complete structured data to support genuine domain analysis without introducing critical errors or omissions.

What would settle it

A case in law or computational biology where the structured database produced by ScheMatiQ leads experts to an incorrect conclusion that they can verify and correct by direct reference to the original documents.

Figures

Figures reproduced from arXiv: 2604.09237 by Barak Raveh, Eliya Habba, Gabriel Stanovsky, Renana Keydar, Reshef Mintz, Shahar Levy.

Figure 1
Figure 1. Figure 1: ScheMatiQ workflow. Given a natural-language question and a document collection, the system (1) discovers the observation unit, (2) discovers a query-guided schema, and (3) extracts structured values from the documents. Researchers can refine the schema and results through an interactive feedback loop. generates structured outputs that matches the vast majority of human-annotated schemas and intro￾duce new… view at source ↗
Figure 2
Figure 2. Figure 2: Diagrams illustrating the three system com [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Screenshots of the ScheMatiQ web interface. Users provide a query and documents, inspect and refine the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Schema-field coverage relative to manually cu [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Schema-field overlap across three input condi [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Different research questions over the same collection of documents lead to different observation units. A [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Simplified LLM prompt excerpts illustrating two core stages of the system pipeline: (a) observation unit [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Many disciplines pose natural-language research questions over large document collections whose answers typically require structured evidence, traditionally obtained by manually designing an annotation schema and exhaustively labeling the corpus, a slow and error-prone process. We introduce ScheMatiQ, which leverages calls to a backbone LLM to take a question and a corpus to produce a schema and a grounded database, with a web interface that lets steer and revise the extraction. In collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology. We release ScheMatiQ as open source with a public web interface, and invite experts across disciplines to use it with their own data. All resources, including the website, source code, and demonstration video, are available at: www.ScheMatiQ-ai.com

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces ScheMatiQ, a system that takes a natural-language research question and document corpus as input, uses calls to a backbone LLM to produce a schema and grounded database, and provides a web interface allowing users to interactively steer and revise the extraction process. Through collaborations with domain experts, the authors claim that the resulting outputs support real-world analysis in law and computational biology. The tool is released as open source with a public web interface and demonstration resources.

Significance. If the interactive steering mechanism reliably produces accurate, complete structured data without critical omissions or errors, ScheMatiQ could meaningfully reduce the manual effort required for schema design and annotation in document-heavy fields. The open-source release, public interface, and explicit invitation for experts to apply it to their own data are concrete strengths that could accelerate adoption and community feedback. However, the lack of any quantitative evaluation makes it impossible to assess whether these benefits are realized in practice.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'in collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology' is presented without any supporting quantitative evidence such as precision/recall, error rates, inter-expert agreement scores, gold-standard comparisons, or descriptions of the specific analyses enabled and errors corrected. This absence directly undermines evaluation of whether user steering mitigates known LLM failure modes in technical domains.
  2. [System and Evaluation sections] System and Evaluation sections: The manuscript describes the web interface for steering but supplies no case-study details, error analysis, or before/after comparisons showing how interactive revisions addressed LLM-specific issues (e.g., hallucinations, incomplete extractions, or domain-specific inaccuracies) in the law or computational biology examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recognition of ScheMatiQ's potential impact, the open-source release, and the invitation for community use. We agree that the current manuscript would benefit from greater transparency regarding the nature and limitations of the evaluation, and we will revise accordingly to strengthen the presentation of the case studies.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'in collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology' is presented without any supporting quantitative evidence such as precision/recall, error rates, inter-expert agreement scores, gold-standard comparisons, or descriptions of the specific analyses enabled and errors corrected. This absence directly undermines evaluation of whether user steering mitigates known LLM failure modes in technical domains.

    Authors: We agree that the abstract claim is stated without quantitative metrics. The collaborations with domain experts provided qualitative validation: experts reviewed the generated schemas and extracted data for their specific research questions and confirmed that the outputs were sufficiently accurate and complete to support downstream analysis in their fields. No formal precision/recall or inter-annotator agreement scores were computed. We will revise the abstract to explicitly characterize the evaluation as qualitative expert validation and will add a short description of the types of analyses enabled and the classes of LLM errors that steering helped correct. revision: yes

  2. Referee: [System and Evaluation sections] System and Evaluation sections: The manuscript describes the web interface for steering but supplies no case-study details, error analysis, or before/after comparisons showing how interactive revisions addressed LLM-specific issues (e.g., hallucinations, incomplete extractions, or domain-specific inaccuracies) in the law or computational biology examples.

    Authors: We acknowledge that the current manuscript provides only high-level descriptions of the law and computational-biology use cases. In the revised version we will expand the System and Evaluation sections with concrete case-study details drawn from the expert collaborations. These additions will include: (1) specific examples of LLM hallucinations or incomplete extractions that occurred, (2) the user steering actions taken via the interface to correct them, and (3) before/after comparisons of the resulting schemas and grounded data. This will make explicit how the interactive mechanism mitigates the failure modes mentioned. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive system paper with no derivations or self-referential predictions

full rationale

The paper describes an interactive LLM-based tool for schema discovery and data extraction from documents, evaluated via domain-expert collaboration in law and biology. No mathematical derivations, equations, fitted parameters, or 'predictions' appear in the provided text. The central claim (that outputs support real-world analysis) is presented as an empirical demonstration rather than a reduction to prior inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core results. This matches the default expectation for non-circular system papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that backbone LLMs can generate useful schemas and grounded extractions when guided interactively; this is treated as a domain assumption rather than something derived or proven in the work.

axioms (1)
  • domain assumption Large language models can be effectively prompted to discover schemas and extract structured information from document collections when provided with a natural language question.
    The system is built around repeated calls to a backbone LLM for schema production and grounding; no independent verification of this capability is supplied in the abstract.

pith-pipeline@v0.9.0 · 5454 in / 1248 out tokens · 88056 ms · 2026-05-10T17:43:59.813323+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    USC CLASS Research Paper, (2519)

    Are trump judges different? evidence from immigration cases.Evidence from Im- migration Cases (September 15, 2025). USC CLASS Research Paper, (2519). Benjamin Newman, Yoonjoo Lee, Aakanksha Naik, Pao Siangliulue, Raymond Fok, Juho Kim, Daniel S Weld, Joseph Chee Chang, and Kyle Lo

  2. [2]

    InProceedings of the 2024 Conference on Empirical Methods in Natu- ral Language Processing, pages 9612–9631, Miami, Florida, USA

    ArxivDI- GESTables: Synthesizing scientific literature into tables using language models. InProceedings of the 2024 Conference on Empirical Methods in Natu- ral Language Processing, pages 9612–9631, Miami, Florida, USA. Association for Computational Lin- guistics. OpenAI

  3. [3]

    InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 23450–23472, Suzhou, China

    Intent- aware schema generation and refinement for litera- ture review tables. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 23450–23472, Suzhou, China. Association for Com- putational Linguistics. Ehud Reiter

  4. [4]

    To cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoning.arXiv preprint arXiv:2409.12183, 2024

    To cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoning.ArXiv preprint, abs/2409.12183. Gemini Team

  5. [5]

    InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online

    Trans- formers: State-of-the-art natural language processing. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics. Xueqing Wu, Jiacheng Zhang, and Hang Li

  6. [6]

    Molecular Biology of the Cell, 23(18):3673–3676

    Nesdb: a database of nes-containing crm1 cargoes. Molecular Biology of the Cell, 23(18):3673–3676. A Use Cases: Full Specifications Legal Domain Dataset.Court decisions of U.S. court cases concerning immigration policies and injunction proceedings. Full Query.Do federal judges appointed by dif- ferent Presidents (Trump vs. other Republican vs. Democratic)...