CEDAR: Context Engineering for Agentic Data Science
Pith reviewed 2026-05-16 14:54 UTC · model grok-4.3
The pith
CEDAR automates data science tasks by structuring LLM prompts into interleaved plan and code sequences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CEDAR imposes structure into the initial prompt with DS-specific input fields that serve as instructions for the agentic system. The solution is materialized as an enumerated sequence of interleaved plan and code blocks generated by separate LLM agents, providing a readable structure to the context at any step of the workflow. Function calls for generating these intermediate texts and corresponding Python code ensure that data stays local, with only aggregate statistics and associated instructions injected into LLM prompts. Fault tolerance and context management are introduced via iterative code generation and smart history rendering. The viability of this agentic data scientist is shown onK
What carries the argument
Structured DS-specific prompt fields combined with an agentic workflow that generates interleaved plan-code blocks via separate LLM agents and function calls to keep data local.
If this is right
- Data science solutions gain readable structure from the plan-code sequence at every workflow step.
- Raw data never enters LLM prompts, reducing context size through local function calls and aggregate statistics only.
- Iterative code generation and smart history rendering add fault tolerance to the automation process.
- Canonical Kaggle challenges can be solved with the agentic setup, showing reduced need for constant oversight.
Where Pith is reading between the lines
- The method could extend to other agentic domains like code refactoring if similar structure is imposed on prompts.
- Performance on proprietary internal datasets might differ from public Kaggle results due to unseen data characteristics.
- Adding external knowledge retrieval could further stabilize outputs on tasks requiring domain expertise beyond standard benchmarks.
Load-bearing premise
The structured prompts and agentic workflow reliably produce correct and complete solutions for arbitrary data science tasks without frequent human intervention or hitting unmanageable context limits.
What would settle it
Running CEDAR on a Kaggle challenge involving high-dimensional data and advanced feature engineering, then checking whether the generated code runs to completion and produces valid results without manual fixes.
read the original abstract
We demonstrate CEDAR, an application for automating data science (DS) tasks with an agentic setup. Solving DS problems with LLMs is an underexplored area that has immense market value. The challenges are manifold: task complexities, data sizes, computational limitations, and context restrictions. We show that these can be alleviated via effective context engineering. We first impose structure into the initial prompt with DS-specific input fields, that serve as instructions for the agentic system. The solution is then materialized as an enumerated sequence of interleaved plan and code blocks generated by separate LLM agents, providing a readable structure to the context at any step of the workflow. Function calls for generating these intermediate texts, and for corresponding Python code, ensure that data stays local, and only aggregate statistics and associated instructions are injected into LLM prompts. Fault tolerance and context management are introduced via iterative code generation and smart history rendering. The viability of our agentic data scientist is demonstrated using canonical Kaggle challenges.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CEDAR, an agentic LLM-based system for automating data science tasks. It addresses challenges of task complexity, data size, computational limits, and context restrictions through context engineering: DS-specific structured input fields in the initial prompt, separate agents generating interleaved enumerated plan/code blocks, function calls that keep raw data local while injecting only aggregates and instructions, and iterative generation with smart history rendering for fault tolerance. Viability is asserted via demonstrations on canonical Kaggle challenges.
Significance. If the workflow reliably produces correct, complete solutions with limited human intervention, the structured context-management approach could advance practical agentic data science automation and offer a template for handling long-horizon DS workflows under LLM constraints. The emphasis on readable interleaved plans and data isolation is a concrete engineering contribution.
major comments (2)
- [Abstract and Kaggle demonstrations section] The central claim that context engineering alleviates the listed challenges and that viability is demonstrated on Kaggle challenges is unsupported by any quantitative evidence. No success rates, error rates, completion statistics, context-length traces, failure-mode analysis, or comparisons to simpler prompting baselines appear in the evaluation section.
- [§3 (Workflow and context management)] The weakest assumption—that the interleaved plan/code workflow and function-call isolation produce correct and complete solutions for arbitrary DS tasks without frequent human intervention or context overflow—is stated but never tested or quantified. No ablation of the individual engineering choices (structured fields, separate agents, iterative rendering) is provided.
minor comments (2)
- [Abstract] Clarify the exact Kaggle challenges used, the precise success criteria applied, and whether any human oversight occurred during the demonstrations.
- [Introduction] Add a short related-work subsection contrasting CEDAR with prior agentic DS or code-generation frameworks to better situate the context-engineering contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that quantitative metrics and component ablations would strengthen the claims regarding context engineering and will revise the manuscript to include them.
read point-by-point responses
-
Referee: [Abstract and Kaggle demonstrations section] The central claim that context engineering alleviates the listed challenges and that viability is demonstrated on Kaggle challenges is unsupported by any quantitative evidence. No success rates, error rates, completion statistics, context-length traces, failure-mode analysis, or comparisons to simpler prompting baselines appear in the evaluation section.
Authors: We acknowledge that the current evaluation relies on qualitative demonstrations. In the revision we will add quantitative results to the Kaggle section, including success rates and completion statistics across multiple challenges, context-length traces, failure-mode analysis, and direct comparisons to simpler prompting baselines. These additions will provide empirical support for the alleviation of task complexity, data size, and context restrictions. revision: yes
-
Referee: [§3 (Workflow and context management)] The weakest assumption—that the interleaved plan/code workflow and function-call isolation produce correct and complete solutions for arbitrary DS tasks without frequent human intervention or context overflow—is stated but never tested or quantified. No ablation of the individual engineering choices (structured fields, separate agents, iterative rendering) is provided.
Authors: We agree that the assumptions in §3 require quantification and that ablations of the individual choices are needed. In the revised manuscript we will include ablations isolating the effects of structured input fields, separate plan/code agents, and iterative smart-history rendering. We will also report statistics on solution correctness, frequency of human intervention, and context-overflow events observed in the Kaggle demonstrations. revision: yes
Circularity Check
No circularity: system description with no derivations or self-referential claims
full rationale
The paper presents CEDAR as an engineering system for agentic data science via structured prompts, interleaved plan/code blocks, function-call isolation, and iterative generation. No equations, fitted parameters, predictions, or first-principles derivations appear in the abstract or described content. The central claim of viability on Kaggle challenges is an empirical demonstration rather than a reduction to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The workflow is self-contained as a descriptive architecture without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can follow structured instructions for planning and coding in data science tasks
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.