Demonstration of Pneuma-Seeker: Agentic System for Reifying and Fulfilling Information Needs on Tabular Data
Pith reviewed 2026-05-10 12:54 UTC · model grok-4.3
The pith
Pneuma-Seeker converts a user's vague information need on tabular data into explicit, inspectable relational specifications.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pneuma-Seeker is an agentic system that reifies a user's information need as explicit, inspectable relational specifications. This reification enables iterative refinement of the information need, targeted data discovery, and provenance-aware execution. Through two real-world procurement use cases, the system leverages LLMs as transparent, interactive analytical collaborators rather than opaque answer engines.
What carries the argument
Reification of an information need into explicit relational specifications that users can inspect, modify, and trace for provenance.
If this is right
- Analysts gain the ability to iteratively adjust their query by directly editing the visible relational specification.
- Data retrieval narrows to only the tables and columns that satisfy the current specification.
- Each execution step records its origin so results remain traceable back to the original need.
- LLMs contribute to building and updating the specification rather than generating standalone answers.
Where Pith is reading between the lines
- The explicit-specification step may reduce the chance that an LLM silently misinterprets an evolving analytical goal.
- The same reification pattern could be tested on non-relational sources such as time-series or graph data.
- Adoption would require the relational specifications to integrate cleanly with existing query engines and visualization tools.
Load-bearing premise
That LLMs can reliably act as transparent interactive collaborators on real procurement data without hidden errors or opaque reasoning steps.
What would settle it
A procurement use-case session where the generated relational specification does not match the user's stated intent after refinement or where provenance records become incomplete.
Figures
read the original abstract
Data analysts working with relational data often start with vague or underspecified questions and refine them iteratively as they explore the data. To support this iterative process, we demonstrate Pneuma-Seeker, a system that reifies a user's information need as explicit, inspectable relational specifications, enabling iterative refinement of the information need, targeted data discovery, and provenance-aware execution. Through two real-world procurement use cases, we show how Pneuma-Seeker leverages LLMs as transparent, interactive analytical collaborators rather than opaque answer engines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Pneuma-Seeker, an agentic LLM-based system that reifies a user's vague or underspecified information needs over tabular/relational data into explicit, inspectable relational specifications. These specifications are intended to support iterative refinement of the need, targeted data discovery, and provenance-aware execution. The approach is demonstrated via two real-world procurement use cases, with the goal of positioning LLMs as transparent, interactive analytical collaborators rather than opaque answer engines.
Significance. If the described reification mechanism and transparency guarantees hold in practice, the work could contribute to more reliable human-AI collaboration in data analysis workflows by making intermediate reasoning steps explicit and editable. The emphasis on relational specifications as an intermediate representation is a potentially useful idea for bridging natural language queries and structured data operations. However, the absence of any implementation details, quantitative metrics, or validation leaves the practical significance unassessable from the current manuscript.
major comments (2)
- The central claim that Pneuma-Seeker produces 'explicit, inspectable relational specifications' enabling 'provenance-aware execution' and transparency is load-bearing but unsupported: the two procurement use cases provide only high-level narrative descriptions with no reported error analysis, ground-truth comparison of generated specifications, schema-mapping accuracy, or trace of how LLM outputs were validated or corrected for hallucinations or incompleteness.
- No implementation details are supplied for the agentic components (e.g., how relational specifications are constructed from LLM outputs, how provenance is tracked, or how iterative refinement is operationalized), which prevents evaluation of whether the system actually achieves the claimed inspectability and reliability in the described scenarios.
minor comments (1)
- The abstract and demonstration sections would benefit from clearer delineation between the system architecture and the specific use-case outcomes to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our demonstration paper. We address each major comment below, with a focus on clarifying the scope of a demonstration while committing to improvements where feasible.
read point-by-point responses
-
Referee: The central claim that Pneuma-Seeker produces 'explicit, inspectable relational specifications' enabling 'provenance-aware execution' and transparency is load-bearing but unsupported: the two procurement use cases provide only high-level narrative descriptions with no reported error analysis, ground-truth comparison of generated specifications, schema-mapping accuracy, or trace of how LLM outputs were validated or corrected for hallucinations or incompleteness.
Authors: We acknowledge that the use cases are presented as high-level narratives illustrating the reification workflow rather than as a quantitative study. As this is a demonstration paper, the emphasis is on showing how underspecified needs are made explicit and iteratively refined in real procurement scenarios, not on benchmarking LLM accuracy. We will revise to incorporate more detailed traces of the generated relational specifications, examples of user inspection and refinement steps, and narrative descriptions of how outputs were validated in the cases. However, systematic error analysis, ground-truth comparisons, and hallucination metrics are outside the current scope and will be noted as future work. revision: partial
-
Referee: No implementation details are supplied for the agentic components (e.g., how relational specifications are constructed from LLM outputs, how provenance is tracked, or how iterative refinement is operationalized), which prevents evaluation of whether the system actually achieves the claimed inspectability and reliability in the described scenarios.
Authors: The manuscript currently describes the agentic process at the level of the use-case workflows. We agree that additional technical specifics would improve evaluability and will revise the paper to include more detailed descriptions (and, where appropriate, pseudocode) of how LLM outputs are parsed into relational specifications, how provenance is maintained across refinement iterations, and how the iterative loop is operationalized. This will be added without altering the demonstration focus. revision: yes
Circularity Check
No circularity: system demonstration without derivations or self-referential claims
full rationale
The paper is a forward demonstration of Pneuma-Seeker through two procurement use cases, describing how it reifies information needs as relational specifications. No equations, derivations, fitted parameters, uniqueness theorems, or self-citations appear in the provided text or abstract. The central claims rest on the system's design and LLM usage rather than reducing to any prior inputs by construction, satisfying the criteria for a self-contained non-circular presentation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Muhammad Imam Luthfi Balaka, David Alexander, Qiming Wang, Yue Gong, Adila Krisnadhi, and Raul Castro Fernandez. 2025. Pneuma: leveraging llms for tabular data representation and retrieval in an end-to-end system.Proc. ACM Manag. Data, 3, 3, Article 200, (June 2025), 28 pages. doi:10.1145/3725337. Demonstration of Pneuma-Seeker: Agentic System for Reifyin...
- [2]
-
[3]
Raul Castro Fernandez, Essam Mansour, Abdulhakim A. Qahtan, Ahmed El- magarmid, Ihab Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2018. Seeping semantics: linking datasets using word embed- dings for data discovery. In2018 IEEE 34th International Conference on Data Engineering (ICDE), 989–1000. doi:10.1109/ICDE.2018.00093
-
[4]
Chicago Data Portal. 2026. Chicago data portal. https://data.cityofchicago.org. Retrieved March 12, 2026. (2026)
work page 2026
-
[5]
Yufeng Du et al. 2025. Context length alone hurts LLM performance despite perfect retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025. Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, (Eds.) Association for Computational Linguistics, Suzhou, China, (Nov. 2025), 23281–23298.isbn: 979-8-89176-335...
-
[6]
Hugging Face. 2024. Smolagents: a lightweight library to build agents that write and run python code. https://github.com/huggingface/smolagents. Accessed: 2026-02-28. (2024)
work page 2024
-
[7]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. InProceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22) Article 1613. Curran Associates Inc., New Orleans, LA, USA, 15 pages.isbn: 9781713871088
work page 2022
- [8]
- [9]
-
[10]
Chunwei Liu et al. 2025. Palimpzest: optimizing ai-powered analytics with declarative query processing. InProceedings of the Conference on Innovative Database Research (CIDR)
work page 2025
-
[11]
Yuhan Liu, Michael JQ Zhang, and Eunsol Choi. 2025. User feedback in human- LLM dialogues: a lens to understand users but noisy as a learning signal. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, (Eds.) Association for Computational...
-
[12]
Sunghyun Park, Han Li, Ameen Patel, Sidharth Mudgal, Sungjin Lee, Young- Bum Kim, Spyros Matsoukas, and Ruhi Sarikaya. 2021. A scalable framework for learning from implicit user feedback to improve natural language un- derstanding in large-scale conversational AI systems. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Proces...
-
[13]
Liana Patel, Siddharth Jha, Melissa Pan, Harshit Gupta, Parth Asawa, Carlos Guestrin, and Matei Zaharia. 2024. Semantic operators: a declarative model for rich, ai-based data processing. In https://api.semanticscholar.org/CorpusID:27 1218837
work page 2024
-
[14]
Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analyti- cal database. InProceedings of the 2019 International Conference on Management of Data(SIGMOD ’19). Association for Computing Machinery, Amsterdam, Netherlands, 1981–1984.isbn: 9781450356435. doi:10.1145/3299869.3320212
-
[15]
Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. Docetl: agentic query rewriting and evaluation for complex document processing.Proc. VLDB Endow., 18, 9, (May 2025), 3035–
work page 2025
-
[16]
doi:10.14778/3746405.3746426
-
[17]
Matthias Urban, Jialin Ding, David Kernert, Kapil Vaidya, and Tim Kraska
-
[18]
InProceedings of the Workshop on Human-In-the-Loop Data Analytics(HILDA ’25) Article
Utilizing past user feedback for more accurate text-to-sql. InProceedings of the Workshop on Human-In-the-Loop Data Analytics(HILDA ’25) Article
-
[19]
Association for Computing Machinery, Intercontinental Berlin, Berlin, Germany, 7 pages.isbn: 9798400719592. doi:10.1145/3736733.3736739
-
[20]
Zilong Wang et al. 2024. Chain-of-table: evolving tables in the reasoning chain for table understanding. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=4L0xnS4GQM
work page 2024
-
[21]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(NIPS ’22) Article 1800. Curran Associates Inc., New Orleans, LA, USA, ...
work page 2022
-
[22]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. React: synergizing reasoning and acting in language models. (2023). https://arxiv.org/abs/2210.03629 arXiv: 2210.03629[cs.CL]. Received 13 March 2026; revised 13 March 2026; accepted 13 March 2026
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.