Demonstration of Pneuma-Seeker: Agentic System for Reifying and Fulfilling Information Needs on Tabular Data

Muhammad Imam Luthfi Balaka; Raul Castro Fernandez

arxiv: 2604.14422 · v1 · submitted 2026-04-15 · 💻 cs.AI

Demonstration of Pneuma-Seeker: Agentic System for Reifying and Fulfilling Information Needs on Tabular Data

Muhammad Imam Luthfi Balaka , Raul Castro Fernandez This is my paper

Pith reviewed 2026-05-10 12:54 UTC · model grok-4.3

classification 💻 cs.AI

keywords Pneuma-Seekerrelational specificationsinformation reificationtabular dataLLM agentsdata discoveryprovenance trackingprocurement scenarios

0 comments

The pith

Pneuma-Seeker converts a user's vague information need on tabular data into explicit, inspectable relational specifications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Pneuma-Seeker as a system that takes initial underspecified questions from data analysts and turns them into clear relational specifications users can examine and change. This supports step-by-step refinement of the query, focused retrieval of relevant tables or columns, and tracking of how each result was obtained. The approach treats large language models as visible partners that help build and adjust these specifications rather than delivering final answers in one step. Demonstrations with two procurement scenarios show the workflow in practice, where analysts start broad and narrow their needs through interaction. The core benefit is greater control and transparency over the analytical process compared to direct question-answering tools.

Core claim

Pneuma-Seeker is an agentic system that reifies a user's information need as explicit, inspectable relational specifications. This reification enables iterative refinement of the information need, targeted data discovery, and provenance-aware execution. Through two real-world procurement use cases, the system leverages LLMs as transparent, interactive analytical collaborators rather than opaque answer engines.

What carries the argument

Reification of an information need into explicit relational specifications that users can inspect, modify, and trace for provenance.

If this is right

Analysts gain the ability to iteratively adjust their query by directly editing the visible relational specification.
Data retrieval narrows to only the tables and columns that satisfy the current specification.
Each execution step records its origin so results remain traceable back to the original need.
LLMs contribute to building and updating the specification rather than generating standalone answers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit-specification step may reduce the chance that an LLM silently misinterprets an evolving analytical goal.
The same reification pattern could be tested on non-relational sources such as time-series or graph data.
Adoption would require the relational specifications to integrate cleanly with existing query engines and visualization tools.

Load-bearing premise

That LLMs can reliably act as transparent interactive collaborators on real procurement data without hidden errors or opaque reasoning steps.

What would settle it

A procurement use-case session where the generated relational specification does not match the user's stated intent after refinement or where provenance records become incomplete.

Figures

Figures reproduced from arXiv: 2604.14422 by Muhammad Imam Luthfi Balaka, Raul Castro Fernandez.

**Figure 1.** Figure 1: illustrates the architecture of Pneuma-Seeker, which consists of three core components: Conductor, Materializer, and Retriever. Conductor orchestrates the workflow. It translates I + 1https://www.jaggaer.com into (T, 𝑆), plans actions, invokes retrieval and materialization, and manages user interaction. Materializer constructs the views in T by applying relational and semantic operators, producing interme… view at source ↗

**Figure 2.** Figure 2: Demonstration of Pneuma-Seeker on Two Use Cases produces an initial response in 1 minute and 41 seconds. After refinement of the integration strategy, a second response is produced in 3 minutes and 27 seconds (Figure 2b). For clarity, we refer to the two input tables as the internal test table and the vendor proposal table. The internal table contains laboratory tests performed in-house, including attrib… view at source ↗

read the original abstract

Data analysts working with relational data often start with vague or underspecified questions and refine them iteratively as they explore the data. To support this iterative process, we demonstrate Pneuma-Seeker, a system that reifies a user's information need as explicit, inspectable relational specifications, enabling iterative refinement of the information need, targeted data discovery, and provenance-aware execution. Through two real-world procurement use cases, we show how Pneuma-Seeker leverages LLMs as transparent, interactive analytical collaborators rather than opaque answer engines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Pneuma-Seeker, an agentic LLM-based system that reifies a user's vague or underspecified information needs over tabular/relational data into explicit, inspectable relational specifications. These specifications are intended to support iterative refinement of the need, targeted data discovery, and provenance-aware execution. The approach is demonstrated via two real-world procurement use cases, with the goal of positioning LLMs as transparent, interactive analytical collaborators rather than opaque answer engines.

Significance. If the described reification mechanism and transparency guarantees hold in practice, the work could contribute to more reliable human-AI collaboration in data analysis workflows by making intermediate reasoning steps explicit and editable. The emphasis on relational specifications as an intermediate representation is a potentially useful idea for bridging natural language queries and structured data operations. However, the absence of any implementation details, quantitative metrics, or validation leaves the practical significance unassessable from the current manuscript.

major comments (2)

The central claim that Pneuma-Seeker produces 'explicit, inspectable relational specifications' enabling 'provenance-aware execution' and transparency is load-bearing but unsupported: the two procurement use cases provide only high-level narrative descriptions with no reported error analysis, ground-truth comparison of generated specifications, schema-mapping accuracy, or trace of how LLM outputs were validated or corrected for hallucinations or incompleteness.
No implementation details are supplied for the agentic components (e.g., how relational specifications are constructed from LLM outputs, how provenance is tracked, or how iterative refinement is operationalized), which prevents evaluation of whether the system actually achieves the claimed inspectability and reliability in the described scenarios.

minor comments (1)

The abstract and demonstration sections would benefit from clearer delineation between the system architecture and the specific use-case outcomes to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our demonstration paper. We address each major comment below, with a focus on clarifying the scope of a demonstration while committing to improvements where feasible.

read point-by-point responses

Referee: The central claim that Pneuma-Seeker produces 'explicit, inspectable relational specifications' enabling 'provenance-aware execution' and transparency is load-bearing but unsupported: the two procurement use cases provide only high-level narrative descriptions with no reported error analysis, ground-truth comparison of generated specifications, schema-mapping accuracy, or trace of how LLM outputs were validated or corrected for hallucinations or incompleteness.

Authors: We acknowledge that the use cases are presented as high-level narratives illustrating the reification workflow rather than as a quantitative study. As this is a demonstration paper, the emphasis is on showing how underspecified needs are made explicit and iteratively refined in real procurement scenarios, not on benchmarking LLM accuracy. We will revise to incorporate more detailed traces of the generated relational specifications, examples of user inspection and refinement steps, and narrative descriptions of how outputs were validated in the cases. However, systematic error analysis, ground-truth comparisons, and hallucination metrics are outside the current scope and will be noted as future work. revision: partial
Referee: No implementation details are supplied for the agentic components (e.g., how relational specifications are constructed from LLM outputs, how provenance is tracked, or how iterative refinement is operationalized), which prevents evaluation of whether the system actually achieves the claimed inspectability and reliability in the described scenarios.

Authors: The manuscript currently describes the agentic process at the level of the use-case workflows. We agree that additional technical specifics would improve evaluability and will revise the paper to include more detailed descriptions (and, where appropriate, pseudocode) of how LLM outputs are parsed into relational specifications, how provenance is maintained across refinement iterations, and how the iterative loop is operationalized. This will be added without altering the demonstration focus. revision: yes

Circularity Check

0 steps flagged

No circularity: system demonstration without derivations or self-referential claims

full rationale

The paper is a forward demonstration of Pneuma-Seeker through two procurement use cases, describing how it reifies information needs as relational specifications. No equations, derivations, fitted parameters, uniqueness theorems, or self-citations appear in the provided text or abstract. The central claims rest on the system's design and LLM usage rather than reducing to any prior inputs by construction, satisfying the criteria for a self-contained non-circular presentation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a system demonstration paper with no mathematical modeling, so the ledger contains no entries.

pith-pipeline@v0.9.0 · 5386 in / 981 out tokens · 40550 ms · 2026-05-10T12:54:34.745248+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Muhammad Imam Luthfi Balaka, David Alexander, Qiming Wang, Yue Gong, Adila Krisnadhi, and Raul Castro Fernandez. 2025. Pneuma: leveraging llms for tabular data representation and retrieval in an end-to-end system.Proc. ACM Manag. Data, 3, 3, Article 200, (June 2025), 28 pages. doi:10.1145/3725337. Demonstration of Pneuma-Seeker: Agentic System for Reifyin...

work page doi:10.1145/3725337 2025
[2]

Muhammad Imam Luthfi Balaka, John Hillesland, Kemal Badur, and Raul Castro Fernandez. 2026. Pneuma-seeker: a relational reification mechanism to align ai agents with human work over relational data. (2026). https://arxiv.org/abs/260 3.10747 arXiv: 2603.10747[cs.DB]

work page arXiv 2026
[3]

Qahtan, Ahmed El- magarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang

Raul Castro Fernandez, Essam Mansour, Abdulhakim A. Qahtan, Ahmed El- magarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2018. Seeping semantics: linking datasets using word embed- dings for data discovery. In2018 IEEE 34th International Conference on Data Engineering (ICDE), 989–1000. doi:10.1109/ICDE.2018.00093

work page doi:10.1109/icde.2018.00093 2018
[4]

Chicago Data Portal. 2026. Chicago data portal. https://data.cityofchicago.org. Retrieved March 12, 2026. (2026)

work page 2026
[5]

Yufeng Du et al. 2025. Context length alone hurts LLM performance despite perfect retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025. Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, (Eds.) Association for Computational Linguistics, Suzhou, China, (Nov. 2025), 23281–23298.isbn: 979-8-89176-335...

work page doi:10.18653/v1/202 2025
[6]

Hugging Face. 2024. Smolagents: a lightweight library to build agents that write and run python code. https://github.com/huggingface/smolagents. Accessed: 2026-02-28. (2024)

work page 2024
[7]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. InProceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22) Article 1613. Curran Associates Inc., New Orleans, LA, USA, 15 pages.isbn: 9781713871088

work page 2022
[8]

Eugenie Lai et al. 2025. Kramabench: a benchmark for ai systems on data-to- insight pipelines over data lakes. (2025). https://arxiv.org/abs/2506.06541 arXiv: 2506.06541[cs.DB]

work page arXiv 2025
[9]

Wen-Zhi Li and Sainyam Galhotra. 2026. Octopus: a lightweight entity-aware system for multi-table data discovery and cell-level retrieval. (2026). https://ar xiv.org/abs/2601.02304 arXiv: 2601.02304[cs.DB]

work page arXiv 2026
[10]

Chunwei Liu et al. 2025. Palimpzest: optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR)

work page 2025
[11]

Yuhan Liu, Michael JQ Zhang, and Eunsol Choi. 2025. User feedback in human- LLM dialogues: a lens to understand users but noisy as a learning signal. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, (Eds.) Association for Computational...

work page doi:10.18653/v1/2025.e 2025
[12]

Sunghyun Park, Han Li, Ameen Patel, Sidharth Mudgal, Sungjin Lee, Young- Bum Kim, Spyros Matsoukas, and Ruhi Sarikaya. 2021. A scalable framework for learning from implicit user feedback to improve natural language un- derstanding in large-scale conversational AI systems. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Proces...

work page doi:10.18653/v1/2021.emnlp-main.489 2021
[13]

Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2024. Semantic operators: a declarative model for rich, ai-based data processing. In https://api.semanticscholar.org/CorpusID:27 1218837

work page 2024
[14]

Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analyti- cal database. InProceedings of the 2019 International Conference on Management of Data(SIGMOD ’19). Association for Computing Machinery, Amsterdam, Netherlands, 1981–1984.isbn: 9781450356435. doi:10.1145/3299869.3320212

work page doi:10.1145/3299869.3320212 2019
[15]

Parameswaran, and Eugene Wu

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. Docetl: agentic query rewriting and evaluation for complex document processing.Proc. VLDB Endow., 18, 9, (May 2025), 3035–

work page 2025
[16]

doi:10.14778/3746405.3746426

work page doi:10.14778/3746405.3746426
[17]

Matthias Urban, Jialin Ding, David Kernert, Kapil Vaidya, and Tim Kraska

work page
[18]

InProceedings of the Workshop on Human-In-the-Loop Data Analytics(HILDA ’25) Article

Utilizing past user feedback for more accurate text-to-sql. InProceedings of the Workshop on Human-In-the-Loop Data Analytics(HILDA ’25) Article

work page
[19]

doi:10.1145/3736733.3736739

Association for Computing Machinery, Intercontinental Berlin, Berlin, Germany, 7 pages.isbn: 9798400719592. doi:10.1145/3736733.3736739

work page doi:10.1145/3736733.3736739
[20]

Zilong Wang et al. 2024. Chain-of-table: evolving tables in the reasoning chain for table understanding. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=4L0xnS4GQM

work page 2024
[21]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(NIPS ’22) Article 1800. Curran Associates Inc., New Orleans, LA, USA, ...

work page 2022
[22]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: synergizing reasoning and acting in language models. (2023). https://arxiv.org/abs/2210.03629 arXiv: 2210.03629[cs.CL]. Received 13 March 2026; revised 13 March 2026; accepted 13 March 2026

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Muhammad Imam Luthfi Balaka, David Alexander, Qiming Wang, Yue Gong, Adila Krisnadhi, and Raul Castro Fernandez. 2025. Pneuma: leveraging llms for tabular data representation and retrieval in an end-to-end system.Proc. ACM Manag. Data, 3, 3, Article 200, (June 2025), 28 pages. doi:10.1145/3725337. Demonstration of Pneuma-Seeker: Agentic System for Reifyin...

work page doi:10.1145/3725337 2025

[2] [2]

Muhammad Imam Luthfi Balaka, John Hillesland, Kemal Badur, and Raul Castro Fernandez. 2026. Pneuma-seeker: a relational reification mechanism to align ai agents with human work over relational data. (2026). https://arxiv.org/abs/260 3.10747 arXiv: 2603.10747[cs.DB]

work page arXiv 2026

[3] [3]

Qahtan, Ahmed El- magarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang

Raul Castro Fernandez, Essam Mansour, Abdulhakim A. Qahtan, Ahmed El- magarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2018. Seeping semantics: linking datasets using word embed- dings for data discovery. In2018 IEEE 34th International Conference on Data Engineering (ICDE), 989–1000. doi:10.1109/ICDE.2018.00093

work page doi:10.1109/icde.2018.00093 2018

[4] [4]

Chicago Data Portal. 2026. Chicago data portal. https://data.cityofchicago.org. Retrieved March 12, 2026. (2026)

work page 2026

[5] [5]

Yufeng Du et al. 2025. Context length alone hurts LLM performance despite perfect retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025. Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, (Eds.) Association for Computational Linguistics, Suzhou, China, (Nov. 2025), 23281–23298.isbn: 979-8-89176-335...

work page doi:10.18653/v1/202 2025

[6] [6]

Hugging Face. 2024. Smolagents: a lightweight library to build agents that write and run python code. https://github.com/huggingface/smolagents. Accessed: 2026-02-28. (2024)

work page 2024

[7] [7]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. InProceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22) Article 1613. Curran Associates Inc., New Orleans, LA, USA, 15 pages.isbn: 9781713871088

work page 2022

[8] [8]

Eugenie Lai et al. 2025. Kramabench: a benchmark for ai systems on data-to- insight pipelines over data lakes. (2025). https://arxiv.org/abs/2506.06541 arXiv: 2506.06541[cs.DB]

work page arXiv 2025

[9] [9]

Wen-Zhi Li and Sainyam Galhotra. 2026. Octopus: a lightweight entity-aware system for multi-table data discovery and cell-level retrieval. (2026). https://ar xiv.org/abs/2601.02304 arXiv: 2601.02304[cs.DB]

work page arXiv 2026

[10] [10]

Chunwei Liu et al. 2025. Palimpzest: optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR)

work page 2025

[11] [11]

Yuhan Liu, Michael JQ Zhang, and Eunsol Choi. 2025. User feedback in human- LLM dialogues: a lens to understand users but noisy as a learning signal. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, (Eds.) Association for Computational...

work page doi:10.18653/v1/2025.e 2025

[12] [12]

Sunghyun Park, Han Li, Ameen Patel, Sidharth Mudgal, Sungjin Lee, Young- Bum Kim, Spyros Matsoukas, and Ruhi Sarikaya. 2021. A scalable framework for learning from implicit user feedback to improve natural language un- derstanding in large-scale conversational AI systems. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Proces...

work page doi:10.18653/v1/2021.emnlp-main.489 2021

[13] [13]

Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2024. Semantic operators: a declarative model for rich, ai-based data processing. In https://api.semanticscholar.org/CorpusID:27 1218837

work page 2024

[14] [14]

Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analyti- cal database. InProceedings of the 2019 International Conference on Management of Data(SIGMOD ’19). Association for Computing Machinery, Amsterdam, Netherlands, 1981–1984.isbn: 9781450356435. doi:10.1145/3299869.3320212

work page doi:10.1145/3299869.3320212 2019

[15] [15]

Parameswaran, and Eugene Wu

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. Docetl: agentic query rewriting and evaluation for complex document processing.Proc. VLDB Endow., 18, 9, (May 2025), 3035–

work page 2025

[16] [16]

doi:10.14778/3746405.3746426

work page doi:10.14778/3746405.3746426

[17] [17]

Matthias Urban, Jialin Ding, David Kernert, Kapil Vaidya, and Tim Kraska

work page

[18] [18]

InProceedings of the Workshop on Human-In-the-Loop Data Analytics(HILDA ’25) Article

Utilizing past user feedback for more accurate text-to-sql. InProceedings of the Workshop on Human-In-the-Loop Data Analytics(HILDA ’25) Article

work page

[19] [19]

doi:10.1145/3736733.3736739

Association for Computing Machinery, Intercontinental Berlin, Berlin, Germany, 7 pages.isbn: 9798400719592. doi:10.1145/3736733.3736739

work page doi:10.1145/3736733.3736739

[20] [20]

Zilong Wang et al. 2024. Chain-of-table: evolving tables in the reasoning chain for table understanding. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=4L0xnS4GQM

work page 2024

[21] [21]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(NIPS ’22) Article 1800. Curran Associates Inc., New Orleans, LA, USA, ...

work page 2022

[22] [22]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: synergizing reasoning and acting in language models. (2023). https://arxiv.org/abs/2210.03629 arXiv: 2210.03629[cs.CL]. Received 13 March 2026; revised 13 March 2026; accepted 13 March 2026

work page internal anchor Pith review Pith/arXiv arXiv 2023