pith. sign in

arxiv: 2604.20855 · v3 · submitted 2026-02-24 · 💻 cs.IR · cs.MA

Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis

Pith reviewed 2026-05-15 19:53 UTC · model grok-4.3

classification 💻 cs.IR cs.MA
keywords agentic web explorationdynamic knowledge graphadversarial refinementcreative synthesisnovelty in answersdeep research agentsinformation retrievalautonomous agents
0
0 comments X

The pith

Caesar builds a dynamic knowledge graph through deep web traversal and uses adversarial refinement to synthesize answers with higher novelty and structural coherence than existing agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Caesar as an agentic architecture that moves beyond flat web retrieval by constructing a dynamic knowledge graph during exploration. This graph guides the agent toward diverse, non-obvious connections across the web's structure, while adversarial refinement during synthesis actively seeks novel perspectives instead of confirming prior knowledge. If correct, the approach would enable autonomous systems to produce original artifacts and answers rather than derivative summaries, with measured gains of 13 to 23 percent over current deep research agents across output formats. A reader would care because it targets the core limitation of convergent search in agentic frameworks, potentially unlocking more creative uses of web-scale information.

Core claim

Caesar performs deep web traversal driven by a context-aware policy to build a dynamic knowledge graph that serves as a navigational scaffold. This graph maximizes information coverage by revealing connections that flat retrieval misses. Synthesis then occurs through adversarial refinement that seeks novel perspectives. The result is the generation of artifacts and answers with high novelty and structural coherence, delivering 13 to 23 percent improvement over state-of-the-art agents in creative synthesis challenges and strong performance across all tested output formats.

What carries the argument

Dynamic knowledge graph from context-aware web traversal policy that acts as navigational scaffold, paired with adversarial refinement for synthesis

If this is right

  • Produces creative answers and artifacts with measurably higher novelty and structural coherence
  • Delivers 13 to 23 percent gains over state-of-the-art deep research agents on synthesis benchmarks
  • Maintains dominance across multiple output formats including text, structured data, and hybrid artifacts
  • Shifts agent behavior from convergent search to associative synthesis of new ideas
  • Bridges information gathering directly to insight generation without intermediate derivative summaries

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph-based scaffold could be tested in domains like hypothesis generation in science, where non-obvious cross-domain links matter most.
  • Future agents might combine this traversal with memory mechanisms to handle even longer exploration horizons without losing coherence.
  • The adversarial component suggests a general pattern for countering confirmation bias in any retrieval-augmented generation system.
  • Scaling the policy to specialized subgraphs, such as academic paper networks, could be a direct next step for domain-specific creativity tasks.

Load-bearing premise

A dynamic knowledge graph built from web traversal will reliably surface diverse non-obvious information missed by flat retrieval, and adversarial refinement will produce genuinely novel insights rather than rephrased outputs.

What would settle it

Blind expert evaluation of novelty and coherence on identical creative prompts, where Caesar-generated answers show no statistically significant improvement over flat-retrieval baselines, would falsify the core performance claim.

Figures

Figures reproduced from arXiv: 2604.20855 by Elliot Meyerson, Jason Liang, Risto Miikkulainen.

Figure 1
Figure 1. Figure 1: Visualization of the Caesar architecture. (Left) Phase 1: Deep Web Exploration. A dynamic exploration policy controls a three-stage loop (Perceive, Think, Act) to traverse the web and to build a knowledge graph/database from insights. (Right) Phase 2: Adversarial Artifact Synthesis. Insights are retrieved to synthesize an initial draft. The agent then enters a recursive cycle, critiquing the current draft … view at source ↗
Figure 2
Figure 2. Figure 2: The knowledge graphs G created by Caesar during the deep web exploration phase for each of the five challenges. Brighter colors indicate further exploration depth from the root node (red) while cyan nodes indicate sources cited by the final artifact text. These figures show that the semantic content of the challenge has a substantial impact on exploration strategy and the diversity of network topologies ge… view at source ↗
Figure 3
Figure 3. Figure 3: Evolution of the knowledge graph G for Challenge 5 over 1000 steps. Brighter colors indicate further exploration depth from the root node (red). The figures show a transition from initial depth-first search to breadth-first branching later. The lower right contains a t-SNE [Maaten and Hinton, 2008] plot for node text embeddings in G that shows the diversity of insights collected during Caesar’s exploration… view at source ↗
read the original abstract

To advance from passive retrieval to creative discovery of new ideas, autonomous agents must be capable of deep, associative synthesis. However, current agentic frameworks prioritize convergent search, often resulting in derivative summaries that lack creativity. Caesar is an agentic architecture designed to bridge the gap between information gathering and synthesis of new insights. Unlike existing agents that treat the web as a flat sequence of disconnected documents, Caesar performs a deep web traversal to construct a dynamic knowledge graph. This graph then serves as a navigational scaffold, guiding the agent to diverse, non-obvious information that flat retrieval would never encounter. Caesar thus consists of two components: (1) exploration driven by a dynamic context-aware policy that maximizes information coverage across the web's topological structure, and (2) synthesis through adversarial refinement that actively seeks novel perspectives rather than confirming established priors. Caesar demonstrates the ability to generate artifacts and answers characterized by high novelty and structural coherence, achieving 13% to 23% improvement over state-of-the-art deep research agents in creative synthesis challenges, with strong dominance across all output formats.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper presents Caesar, an agentic architecture for creative answer synthesis. It constructs a dynamic knowledge graph via deep web traversal using a context-aware policy that maximizes information coverage, then applies adversarial refinement to generate novel perspectives. The central claim is that this yields artifacts with high novelty and structural coherence, delivering 13% to 23% improvement over state-of-the-art deep research agents across creative synthesis tasks and output formats.

Significance. If the claimed gains are shown to arise specifically from the dynamic KG traversal and adversarial refinement rather than from longer context or prompt engineering, the work would meaningfully advance agentic systems beyond convergent retrieval toward divergent synthesis. The architecture directly targets a recognized limitation in current frameworks, and reproducible evaluation protocols for novelty would strengthen its contribution.

major comments (3)
  1. [Architecture and Synthesis sections] The architecture description leaves the refinement objective, adversary definition, and stopping criteria unspecified. This is load-bearing for the central claim because the 13-23% improvement is attributed to adversarial refinement producing genuinely novel insights; without these details it remains possible that measured gains arise from increased context length alone.
  2. [Experimental Evaluation] No experimental details, baselines, metrics, controls, or ablation studies are supplied to support the quantitative claim of 13-23% improvement. The evaluation of novelty and structural coherence therefore cannot be assessed, undermining verification of the core empirical result.
  3. [Exploration Component] The assumption that dynamic KG traversal consistently surfaces non-obvious information missed by flat retrieval is stated but not validated with concrete traversal examples, coverage metrics, or comparison to standard retrieval baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight areas where additional specification and evidence are needed to substantiate the core claims. We address each point below and commit to revisions that will strengthen the manuscript without altering its central contributions.

read point-by-point responses
  1. Referee: [Architecture and Synthesis sections] The architecture description leaves the refinement objective, adversary definition, and stopping criteria unspecified. This is load-bearing for the central claim because the 13-23% improvement is attributed to adversarial refinement producing genuinely novel insights; without these details it remains possible that measured gains arise from increased context length alone.

    Authors: We agree that the current description is insufficiently precise on these elements. In the revised manuscript we will expand the Architecture and Synthesis sections with: (i) the explicit refinement objective (a minimax formulation that penalizes convergence to prior knowledge while rewarding divergence), (ii) the adversary definition (a secondary LLM agent trained to critique and propose counter-perspectives), and (iii) the stopping criteria (a combination of coverage saturation on the knowledge graph and a novelty threshold measured via embedding distance to existing nodes). These additions will include pseudocode and a short derivation showing why the mechanism is not reducible to context length alone. revision: yes

  2. Referee: [Experimental Evaluation] No experimental details, baselines, metrics, controls, or ablation studies are supplied to support the quantitative claim of 13-23% improvement. The evaluation of novelty and structural coherence therefore cannot be assessed, undermining verification of the core empirical result.

    Authors: We acknowledge the absence of these details in the submitted draft. The revised version will include a dedicated Experimental Evaluation section specifying: the full set of baselines (including GPT-4o with web browsing, ReAct, and recent deep research agents), the exact metrics (human-rated novelty on a 1-5 scale with inter-annotator agreement, structural coherence via graph-edit distance, and automated proxies), control conditions (fixed context length, no KG, no adversary), ablation studies isolating each component, and statistical tests (paired t-tests with p-values) supporting the reported 13-23% gains. We will also release the evaluation prompts and anonymized outputs. revision: yes

  3. Referee: [Exploration Component] The assumption that dynamic KG traversal consistently surfaces non-obvious information missed by flat retrieval is stated but not validated with concrete traversal examples, coverage metrics, or comparison to standard retrieval baselines.

    Authors: We will add a new subsection under Exploration that supplies: (i) two concrete traversal traces with step-by-step node expansions and the non-obvious facts discovered, (ii) quantitative coverage metrics (unique entity coverage, information entropy across the induced graph, and path diversity), and (iii) head-to-head comparisons against flat retrieval baselines (BM25, dense passage retrieval, and web-search-only agents) on the same query set, demonstrating higher recall of peripheral but relevant information. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents an empirical agentic architecture (dynamic knowledge graph traversal + adversarial refinement) and reports measured improvements (13-23%) over baselines. No equations, derivations, fitted parameters, or self-citations appear in the abstract or described text. Central claims rest on experimental outcomes rather than any quantity defined in terms of itself or reduced by construction to prior inputs. The architecture is described at the level of components and policy goals without mathematical formalization that could create self-definitional or fitted-input circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that the described components produce the claimed gains.

pith-pipeline@v0.9.0 · 5484 in / 1078 out tokens · 35231 ms · 2026-05-15T19:53:37.169891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    Language Models are Few-Shot Learners

    Cornell University, 5 2020. doi: 10.48550/arxiv.2005.14165. Markus J. Buehler. Agentic deep graph reasoning yields self-organizing knowledge networks.Jour- nal of Materials Research, 40(15):2204–2242, 7 2025. ISSN 0884-1616. doi: 10.1557/s43578-0 25-01652-1. Ruth M. J. Byrne.The Rational Imagination: How People Create Alternatives to Reality. MIT Press, C...

  2. [2]

    Dedre Gentner

    URLhttps://blog.google/products-and-platforms/products/gemini/gemin i-3/. Dedre Gentner. Structure-mapping: A theoretical framework for analogy.Cognitive Science, 7(2): 155–170, 4 1983. ISSN 0364-0213. doi: 10.1207/s15516709cog0702_3. Joy Paul Guilford.The Nature of Human Intelligence. McGraw-Hill, New York, 1967. Aric A. Hagberg, Daniel A. Schult, and Pi...

  3. [3]

    Jieyi Long

    doi: 10.18653/v1/2024.emnlp-main.35. Jieyi Long. Large language model guided tree-of-thought.ArXiv preprint, abs/2305.08291, 5 2023. doi: 10.48550/arxiv.2305.08291. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(Nov):2579–2605, 2008. ISSN 1532-4435. Guillermo Macbeth, Eugenia Razumiejczyk, ...

  4. [4]

    Agentic Large Language Models, a Survey , volume=

    ISSN 0033-295X. Aske Plaat, Max van Duijn, Niki van Stein, Mike Preuss, Peter van der Putten, and Kees Joost Batenburg. Agentic large language models, a survey.ArXiv preprint, abs/2503.23037, 12 2025. ISSN 1076-9757. doi: 10.1613/jair.1.18675. Hongjin Qian and Zheng Liu. Scent of Knowledge: Optimizing search-enhanced reasoning with information foraging.Ar...

  5. [5]

    Resolving knowledge conflicts in large language models,

    URLhttps://github.com/mem0ai/mem0. GitHub repository. Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, and Yulia Tsvetkov. Resolving knowledge conflicts in large language models.ArXiv preprint, abs/2310.00935, 10 2023. doi: 10.48550/arxiv.2310.00935. Thomas B. Ward. Structured imagination: The role of category structure...

  6. [6]

    Constrained Synthesis Caesar 9.11 8.89 9.11 27.11 Gemini 3 (Deep) 7.78 8.33 7.67 23.78 Gemini 3 (Shallow) 6.89 6.22 6.89 20.00 Sonnet 4.5 (Shallow) 6.67 6.44 6.11 19.22 Sonnet 4.5 (Deep) 5.78 6.11 5.00 16.89 GPT-5.2 (Shallow) 5.56 5.89 5.00 16.45 GPT-5.2 (Deep) 4.11 6.44 3.33 13.88

  7. [7]

    Counterfactual Reasoning Caesar 9.44 9.11 9.44 27.99 Gemini 3 (Deep) 8.56 8.11 8.44 25.11 Sonnet 4.5 (Deep) 6.89 8.33 6.56 21.78 GPT-5.2 (Deep) 4.78 6.44 4.44 15.66 Gemini 3 (Shallow) 5.00 5.22 5.33 15.55 Sonnet 4.5 (Shallow) 3.89 4.89 3.56 12.34 GPT-5.2 (Shallow) 3.78 5.11 3.22 12.11

  8. [8]

    Cross-Domain Synthesis Caesar 9.56 8.56 9.44 27.56 Gemini 3 (Deep) 7.00 7.89 6.78 21.67 Sonnet 4.5 (Deep) 6.22 7.67 6.44 20.33 GPT-5.2 (Deep) 5.00 6.22 4.56 15.78 Gemini 3 (Shallow) 3.78 4.78 3.56 12.12 GPT-5.2 (Shallow) 3.33 4.78 3.11 11.22 Sonnet 4.5 (Shallow) 2.56 3.89 2.44 8.89

  9. [9]

    Meta-Creativity 20 Table 5 – continued from previous page Agent New Useful Surp. Total Caesar 9.22 9.22 8.89 27.33 Gemini 3 (Deep) 8.78 7.228.8924.89 Sonnet 4.5 (Deep) 7.44 7.33 7.22 21.99 GPT-5.2 (Deep) 5.78 5.67 4.67 16.12 Gemini 3 (Shallow) 4.44 4.22 4.22 12.88 Sonnet 4.5 (Shallow) 4.33 4.44 3.89 12.66 GPT-5.2 (Shallow) 4.11 4.67 3.56 12.34

  10. [10]

    Scores represent the mean of nine samples

    Open-Ended Synthesis Caesar8.228.568.0024.78 Gemini 3 (Deep) 8.33 6.44 8.67 23.44 Sonnet 4.5 (Deep) 7.33 8.00 6.89 22.22 GPT-5.2 (Shallow) 7.44 6.33 7.22 20.99 Sonnet 4.5 (Shallow)8.892.339.2220.44 GPT-5.2 (Deep) 5.67 6.78 4.78 17.23 Gemini 3 (Shallow) 5.33 6.56 4.67 16.56 Table 6: Detailed performance breakdown forUnconstrained ELI5 Answers. Scores repre...

  11. [11]

    Constrained Synthesis Caesar 8.89 8.89 8.67 26.45 Sonnet 4.5 (Deep) 6.89 8.22 6.44 21.55 Gemini 3 (Shallow) 6.67 5.89 6.56 19.12 Gemini 3 (Deep) 4.78 6.56 5.22 16.56 GPT-5.2 (Shallow) 5.78 5.33 5.22 16.33 GPT-5.2 (Deep) 4.67 7.33 4.22 16.22 Sonnet 4.5 (Shallow) 5.44 5.33 5.22 15.99

  12. [12]

    Counterfactual Reasoning Caesar 9.22 9.00 9.22 27.44 Sonnet 4.5 (Deep) 7.22 8.00 6.67 21.89 Gemini 3 (Deep) 6.67 6.33 6.89 19.89 GPT-5.2 (Deep) 5.00 7.00 4.67 16.67 Sonnet 4.5 (Shallow) 4.67 5.56 4.44 14.67 GPT-5.2 (Shallow) 3.33 5.00 2.67 11.00 Gemini 3 (Shallow) 2.67 3.56 2.44 8.67

  13. [13]

    Total Gemini 3 (Deep) 3.78 5.22 3.78 12.78 Sonnet 4.5 (Shallow) 3.89 4.56 3.67 12.12 Gemini 3 (Shallow) 2.33 3.11 2.00 7.44

    Cross-Domain Synthesis Caesar 9.11 9.11 8.89 27.11 Sonnet 4.5 (Deep) 6.78 7.78 6.78 21.34 GPT-5.2 (Deep) 5.44 6.89 5.00 17.33 GPT-5.2 (Shallow) 4.00 5.33 3.56 12.89 21 Table 6 – continued from previous page Agent New Useful Surp. Total Gemini 3 (Deep) 3.78 5.22 3.78 12.78 Sonnet 4.5 (Shallow) 3.89 4.56 3.67 12.12 Gemini 3 (Shallow) 2.33 3.11 2.00 7.44

  14. [14]

    Meta-Creativity Caesar 8.78 8.33 8.67 25.78 Sonnet 4.5 (Deep) 7.56 7.44 7.22 22.22 Gemini 3 (Deep) 6.78 5.56 6.89 19.23 GPT-5.2 (Deep) 6.11 6.56 6.00 18.67 Sonnet 4.5 (Shallow) 4.89 5.22 4.00 14.11 GPT-5.2 (Shallow) 4.22 4.67 3.44 12.33 Gemini 3 (Shallow) 3.00 3.11 2.56 8.67

  15. [15]

    Scores represent the mean of nine samples

    Open-Ended Synthesis Sonnet 4.5 (Deep) 7.11 7.89 7.3322.33 Caesar7.118.116.44 21.66 Sonnet 4.5 (Shallow)8.892.899.0020.78 Gemini 3 (Deep) 7.44 5.22 7.33 19.99 GPT-5.2 (Shallow) 6.78 5.22 6.67 18.67 GPT-5.2 (Deep) 5.67 5.78 5.11 16.56 Gemini 3 (Shallow) 5.11 6.67 4.33 16.11 Table 7: Detailed performance breakdown forELI5 Answers (450 Word Limit). Scores re...

  16. [16]

    Constrained Synthesis Caesar 8.78 8.78 9.00 26.56 Gemini 3 (Deep) 6.56 7.33 6.67 20.56 Sonnet 4.5 (Deep) 6.00 7.78 5.33 19.11 Sonnet 4.5 (Shallow) 6.22 6.11 6.00 18.33 GPT-5.2 (Shallow) 5.78 6.11 5.00 16.89 Gemini 3 (Shallow) 5.44 6.00 5.33 16.77 GPT-5.2 (Deep) 4.44 7.00 4.00 15.44

  17. [17]

    Counterfactual Reasoning Caesar 8.33 8.33 8.33 24.99 Gemini 3 (Deep) 8.22 7.67 8.11 24.00 Sonnet 4.5 (Deep) 6.78 8.00 6.67 21.45 Sonnet 4.5 (Shallow) 6.11 6.44 6.00 18.55 GPT-5.2 (Deep) 4.33 6.00 3.44 13.77 Gemini 3 (Shallow) 4.33 4.78 3.78 12.89 GPT-5.2 (Shallow) 3.11 4.67 2.56 10.34

  18. [18]

    Cross-Domain Synthesis Caesar 8.44 9.11 8.22 25.77 22 Table 7 – continued from previous page Agent New Useful Surp. Total Sonnet 4.5 (Shallow) 7.11 6.89 7.11 21.11 Sonnet 4.5 (Deep) 5.78 7.44 6.00 19.22 Gemini 3 (Deep) 5.22 6.22 5.33 16.77 GPT-5.2 (Deep) 3.89 5.89 3.11 12.89 GPT-5.2 (Shallow) 3.89 5.56 3.44 12.89 Gemini 3 (Shallow) 3.11 4.33 2.78 10.22

  19. [19]

    Meta-Creativity Caesar 8.78 8.89 8.56 26.23 Sonnet 4.5 (Deep) 6.89 6.67 6.56 20.12 Gemini 3 (Deep) 5.89 4.78 6.11 16.78 GPT-5.2 (Deep) 5.33 6.11 4.89 16.33 Sonnet 4.5 (Shallow) 5.44 5.11 4.89 15.44 GPT-5.2 (Shallow) 4.78 5.22 4.11 14.11 Gemini 3 (Shallow) 3.56 3.78 3.33 10.67

  20. [20]

    derivative

    Open-Ended Synthesis Sonnet 4.5 (Deep) 7.11 8.00 7.0022.11 Caesar7.338.336.44 22.10 Sonnet 4.5 (Shallow)8.893.009.1121.00 Gemini 3 (Deep) 7.33 5.00 7.89 20.22 GPT-5.2 (Shallow) 6.89 6.00 6.89 19.78 Gemini 3 (Shallow) 4.89 6.56 4.33 15.78 GPT-5.2 (Deep) 4.00 5.33 3.11 12.44 D A Qualitative Comparison of Answers To illustrate the fundamental distinction bet...

  21. [21]

    H o u s e h o l d C o n t i n u i t y Account

    Data O w n e r s h i p : The " H o u s e h o l d C o n t i n u i t y Account " is owned by the user via a Data Trust . The Carrier is a f i d u c i a r y p r o c e s s o r with no o w n e r s h i p rights

  22. [22]

    Duress Modes

    Consent under Duress : F ea tu res " Duress Modes " r e l e a s i n g only minimum a t t e s t a t i o n s . V i o l a t i o n s trigger a u t o m a t i c s a n c t i o n s

  23. [23]

    Silence

    V e r i f i c a t i o n : An I n d e p e n d e n t R e l i a b i l i t y R e g u l a t o r ( multi - s t a k e h o l d e r board ) audits c r y p t o g r a p h i c a l l y signed service re ce ipt s . " Silence " is treated as a risk signal . 28 [ I T E R A T I O N 3] Q : With the g o v e r n a n c e a r c h i t e c t u r e established , what is the sm al...

  24. [24]

    Must - Cover

    G e o g r a p h i c " Must - Cover " Co ntr ac t : Carrier must cover 100% of r e g i s t e r e d h o u s e h o l d s ( no cherry - picking )

  25. [25]

    Two - Part Tariff : Se as on al R et ai ner ( r e a d i n e s s ) + T r i g g e r e d Usage Pa yme nt s ( surge events )

  26. [26]

    Monsoon Pilot

    Stop - loss Pool : Reinsurance - style fund covers costs above c a t a s t r o p h i c t h r e s h o l d s . [ I T E R A T I O N 4] Q : To execute the " Monsoon Pilot " model defined above , what is the supply - side o p e r a t i n g model ? How do you di sp atc h / pay a h e t e r o g e n e o u s network of clinics / vendors in real - time ? A : O p e r...

  27. [27]

    R e s i l i e n c e P r i m i t i v e s

    " R e s i l i e n c e P r i m i t i v e s ": Se rv ic es c o n v e r t e d into s t a n d a r d i z e d modules with strict inputs / outputs ( e . g . , " Acute PTSD S t a b i l i z a t i o n ")

  28. [28]

    Tiered R eg is try : From l ic en sed NGOs ( Tier A ) to c o m m u n i t y actors ( Tier C , s p o n s o r e d by anchors )

  29. [29]

    Work Tokens

    D is pa tc h Engine : Issues " Work Tokens " based on location , availability , and equity c o n s t r a i n t s

  30. [30]

    Escrow Payment : Two - key release re qu ir es S up pl ier Proof + I n d e p e n d e n t Ve ri fi er c o n f i r m a t i o n

  31. [31]

    SUMMARY :

    P e r m i s s i o n i n g : S u p p l i e r s never own the user record ; they write outputs to the ledger via t e m p o r a r y consent tokens . G Detailed Ablation Results This section expands upon the ablation results provided in the main paper to better understand how they affect Caesar’s drafting processes and ELI5 outputs. G.1 Ablation Results for E...

  32. [32]

    ** EXPLORE ** new un - visited pages to discover novel i n f o r m a t i o n or k now le dg e

  33. [33]

    ** B ACK TR AC K ** to the im me di at e p r e v i o u s l y visited page to try a l t e r n a t i v e paths

  34. [34]

    ** W E B _ S E A R C H ** relevant topics to address current e x p l o r a t i o n insights Consider : - K no wl edg e gaps vs areas of s a t u r a t i o n - Depth of current e x p l o r a t i o n branch - Success patterns from previous d ec is io ns - Risk / reward of new e x p l o r a t i o n vs c o n s o l i d a t i o n K.2 Phase 2 Prompts (Adversarial...