pith. sign in

arxiv: 2605.15202 · v1 · submitted 2026-04-01 · 💻 cs.AI · cs.CL· cs.IR

DeepSlide: From Artifacts to Presentation Delivery

Pith reviewed 2026-05-19 17:58 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.IR
keywords AI presentation generationmulti-agent systemsnarrative planningslide-script generationpresentation deliverybenchmark evaluationhuman-in-the-loop AI
0
0 comments X p. Extension

The pith

DeepSlide is a multi-agent system that plans time-budgeted narratives and generates synced slides and scripts to improve delivery while matching visual quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most AI slide tools focus on creating visually plausible decks but leave pacing, narrative structure, and script synergy to the user. DeepSlide addresses this by supporting the full process from requirement gathering through rehearsal, using a logical-chain planner that assigns time budgets to each narrative step. It adds a content retriever for grounding claims, sequential rendering that inherits styles, and sandboxed execution to keep outputs renderable. A new dual-scoreboard benchmark measures static slide quality separately from dynamic delivery aspects such as flow and attention guidance. Across twenty domains the system matches strong baselines on appearance yet shows larger gains on delivery metrics.

Core claim

DeepSlide is a human-in-the-loop multi-agent system that supports the full presentation process from requirement elicitation and time-budgeted narrative planning, to evidence-grounded slide-script generation, attention augmentation, and rehearsal support. It integrates a controllable logical-chain planner with per-node time budgets, a lightweight content-tree retriever for grounding, Markov-style sequential rendering with style inheritance, and sandboxed execution with minimal repair to ensure renderability. Evaluation on a dual-scoreboard benchmark across twenty domains shows it matches strong baselines on artifact quality while achieving larger gains on delivery metrics including narrative

What carries the argument

A controllable logical-chain planner with per-node time budgets that structures the narrative and enforces pacing precision during generation.

Load-bearing premise

The dual-scoreboard benchmark cleanly separates static artifact quality from dynamic delivery excellence without overlap or bias in the evaluation metrics.

What would settle it

A head-to-head user study in which independent raters score the same source content delivered by DeepSlide versus baseline generators and find no advantage or a reversal on narrative flow, pacing precision, or attention guidance scores.

Figures

Figures reproduced from arXiv: 2605.15202 by Haoseng Liu, Jiahang Li, Ming Yang, Weiguo Zheng, Yuzheng Cai, Zhiwei Zhang.

Figure 1
Figure 1. Figure 1: Main experiment on 20 domains (left is delivery scoreboard, while right is artifact scoreboard). [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Limitations of existing approaches and the [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview. Stage 1: requirement elicitation and narrative proposal; Stage 2: logical chain editing and evidence-grounded generation; Stage 3: interactive slide refinement and attention-oriented augmentation; Stage 4: rehearsal and dual-scoreboard evaluation. Effects in Stage 3: Image Focus, Text to Diagram, Keynote, Data Visualization, Motion, Background, Auto Layout (Qwen3.5 [24] as example). 2.1 Overview … view at source ↗
Figure 4
Figure 4. Figure 4: Requirement elicitation and narrative proposal (Stage 1). [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Secondary experiment on audience-specific evaluation. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: DeepSlide vs. Manus, varying audience {BS, MS, PhD} and duration {5, 10, 15} (Case 1). Case 2: Does DeepSlide reduce user burden and delivery pressure? [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Varying retrieval depth. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: DeepSlide’s UI and attention augmented effect preview. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗
read the original abstract

Presentations are a primary medium for scholarly communication, yet most AI slide generators optimize the artifact (a visually plausible deck) while under-optimizing the delivery process (pacing, narrative, and presentation preparation). We present DeepSlide, a human-in-the-loop multi-agent system that supports preparing the full presentation process, from requirement elicitation and time-budgeted narrative planning, to evidence-grounded slide--script generation, attention augmentation, and rehearsal support. DeepSlide integrates (i) a controllable logical-chain planner with per-node time budgets, (ii) a lightweight content-tree retriever for grounding, (iii) Markov-style sequential rendering with style inheritance, and (iv) sandboxed execution with minimal repair to ensure renderability. We further introduce a dual-scoreboard benchmark that cleanly separates static artifact quality from dynamic delivery excellence. Across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while consistently achieving larger gains on delivery metrics, improving narrative flow, pacing precision, and slide--script synergy with clearer attention guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents DeepSlide, a human-in-the-loop multi-agent system for full presentation preparation, integrating a controllable logical-chain planner with per-node time budgets, a lightweight content-tree retriever for grounding, Markov-style sequential rendering with style inheritance, and sandboxed execution for renderability. It introduces a dual-scoreboard benchmark to separate static artifact quality from dynamic delivery excellence and claims that, across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while achieving larger gains on delivery metrics including narrative flow, pacing precision, slide-script synergy, and attention guidance.

Significance. If the empirical results hold under a validated benchmark, the work could meaningfully advance AI-assisted scholarly communication by shifting focus from static slide artifacts to the full delivery process. The integration of planning, retrieval, and rehearsal components represents a practical step toward more usable presentation tools, and the dual-scoreboard idea, if shown to be non-confounded, would be a useful methodological contribution for future evaluations in this area.

major comments (1)
  1. [Benchmark and Evaluation] The central empirical claim—that DeepSlide matches baselines on artifact quality but shows larger gains on delivery metrics—depends on the dual-scoreboard benchmark cleanly isolating static slide quality from dynamic aspects without overlap or bias. The manuscript provides no description of metric definitions, orthogonality tests between the two scoreboards, or controls for potential leakage (e.g., clearer artifact slides enabling better pacing and synergy by construction). This is load-bearing for interpreting the differential-gain result and must be addressed with explicit validation.
minor comments (1)
  1. [Abstract] The abstract states performance gains but does not include any quantitative results, baseline details, or statistical tests; moving a concise summary of key numbers into the abstract would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of DeepSlide to advance AI-assisted scholarly communication by emphasizing the full delivery process. We address the major comment on the dual-scoreboard benchmark below.

read point-by-point responses
  1. Referee: The central empirical claim—that DeepSlide matches baselines on artifact quality but shows larger gains on delivery metrics—depends on the dual-scoreboard benchmark cleanly isolating static slide quality from dynamic aspects without overlap or bias. The manuscript provides no description of metric definitions, orthogonality tests between the two scoreboards, or controls for potential leakage (e.g., clearer artifact slides enabling better pacing and synergy by construction). This is load-bearing for interpreting the differential-gain result and must be addressed with explicit validation.

    Authors: We agree that explicit validation of the separation is essential to support the differential-gain interpretation. In the revised manuscript we will expand the benchmark description (currently in Section 4) with (i) precise definitions and scoring procedures for every metric on both the artifact and delivery scoreboards, (ii) quantitative orthogonality analysis (Pearson and Spearman correlations across the 20 domains), and (iii) leakage-control experiments that evaluate delivery metrics on fixed baseline artifacts and artifact metrics on fixed scripts. These additions will directly address the concern about confounding and provide the requested explicit validation. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The manuscript presents an engineering system (multi-agent pipeline with planner, retriever, sequential rendering, and sandboxed execution) plus an introduced dual-scoreboard benchmark. No equations, fitted parameters, or derivations appear in the provided text. The central empirical claim rests on cross-domain comparisons rather than any quantity that reduces by construction to its own inputs or to a self-citation chain. The benchmark's separation of artifact versus delivery metrics is asserted but not shown to be tautological; it functions as an external evaluation protocol whose validity is open to independent verification. This is the normal case of a self-contained systems paper whose claims do not collapse into definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities beyond the system components and benchmark are described. The dual-scoreboard benchmark is treated as a new evaluation construct without independent validation details.

invented entities (1)
  • dual-scoreboard benchmark no independent evidence
    purpose: To cleanly separate static artifact quality from dynamic delivery excellence in evaluation
    Introduced as a new measurement approach in the abstract to support claims of larger gains on delivery metrics.

pith-pipeline@v0.9.0 · 5723 in / 1255 out tokens · 52526 ms · 2026-05-19T17:58:19.195575+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · 1 internal anchor

  1. [1]

    Alan Kelly.How Scientists Communicate: Dispatches from the Frontiers of KnowledgeDispatches from the Frontiers of Knowledge. 09 2020. ISBN 9780190936600. doi: 10.1093/oso/9780190936600.001.0001

  2. [2]

    The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004

    Michael Alley. The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004. doi: 10.1063/1.1784305

  3. [3]

    Autopresent: Designing structured visuals from scratch

    Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, et al. Autopresent: Designing structured visuals from scratch. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2902–2911, 2025

  4. [4]

    Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024

    Himanshu Maheshwari, Sambaran Bandyopadhyay, Aparna Garimella, and Anandhavelu Natarajan. Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024

  5. [5]

    Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents

    Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents. In Yvette Graham and Matthew Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for...

  6. [6]

    https://github.com/langchain-ai/langchain,

    langchain-ai/langchain: The platform for reliable agents. https://github.com/langchain-ai/langchain,

  7. [8]

    our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026

    Significant-gravitas/autogpt: Autogpt is the vision of accessible ai for everyone, to use and to build on. our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026. Accessed: 2026-02-26

  8. [9]

    https://github.com/microsoft/autogen,

    microsoft/autogen: A programming framework for agentic ai. https://github.com/microsoft/autogen,

  9. [10]

    Accessed: 2026-02-26

  10. [11]

    Camel: Communicative agents for "mind" exploration of large language model society

    Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for "mind" exploration of large language model society. InThirty-seventh Conference on Neural Information Processing Systems, 2023

  11. [12]

    MetaGPT: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmid- huber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 2...

  12. [13]

    Accessed: 2026-02-26

    Manus.https://manus.im/app, 2026. Accessed: 2026-02-26

  13. [14]

    Accessed: 2026-02-26

    Gamma.https://gamma.app/, 2026. Accessed: 2026-02-26

  14. [15]

    Accessed: 2026-02-26

    Google notebooklm.https://notebooklm.google/, 2026. Accessed: 2026-02-26

  15. [16]

    Accessed: 2026-02-26

    Qwen.https://qwen.ai/home, 2026. Accessed: 2026-02-26

  16. [17]

    Accessed: 2026-02-26

    Coze: Next-gen ai app developing platform.https://www.coze.com/, 2026. Accessed: 2026-02-26

  17. [18]

    PPTAgent: Generating and evaluating presentations beyond text-to-slides

    Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. PPTAgent: Generating and evaluating presentations beyond text-to-slides. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Lang...

  18. [19]

    Mayer.Coherence Principle, page 113–133

    Richard E. Mayer.Coherence Principle, page 113–133. Cambridge University Press, 2001

  19. [20]

    Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025

    Tushar Aggarwal and Aarohi Bhand. Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025. 15

  20. [21]

    Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025

    Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, and Ying-Cong Chen. Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025

  21. [22]

    Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024

    Isabel Cachola, Silviu Cucerzan, Allen Herring, Vuksan Mijovic, Erik Oveson, and Sujay Kumar Jauhar. Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024

  22. [23]

    Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026

    Yunqiao Yang, Wenbo Li, Houxing Ren, Zimu Lu, Ke Wang, Zhiyuan Huang, Zhuofan Zong, Mingjie Zhan, and Hongsheng Li. Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026

  23. [24]

    Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025

    Michael Ofengenden, Yunze Man, Ziqi Pang, and Yu-Xiong Wang. Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025

  24. [25]

    Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025

    Zheng Huang, Xukai Liu, Tianyu Hu, Kai Zhang, and Ye Liu. Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025

  25. [26]

    Qwen3.5: Towards native multimodal agents, February 2026

    Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id= qwen3.5

  26. [27]

    Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025

    Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, and Jingchen Shu. Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025

  27. [28]

    this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026

    Dayuanjiang/next-ai-draw-io: A next.js web application that integrates ai capabilities with draw.io diagrams. this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026. Accessed: 2026-02-27

  28. [29]

    Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018

    Deqing Li, Honghui Mei, Yi Shen, Shuang Su, Wenli Zhang, Junting Wang, Ming Zu, and Wei Chen. Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. ISSN 2468-502X. doi: https://doi.org/10.1016/j.visinf.2018.04.011. URL https://www.sciencedirect.com/ science/article/pii/S2468502X18300068

  29. [30]

    https://deepmind.google/models/gemini-image/,

    Gemini image – nano banana — google deepmind. https://deepmind.google/models/gemini-image/,

  30. [31]

    Accessed: 2026-02-27

  31. [32]

    Doc2ppt: Automatic presentation slides generation from scientific documents

    Tsu-Jui Fu, William Yang Wang, Daniel McDuff, and Yale Song. Doc2ppt: Automatic presentation slides generation from scientific documents. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 634–642, 2022

  32. [33]

    Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024

    Keshav Kumar and Ravindranath Chowdary. Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024

  33. [34]

    Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025

    Yuheng Yang, Wenjia Jiang, Yang Wang, Yiwei Wang, and Chi Zhang. Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025

  34. [35]

    PresentAgent: Multimodal agent for presentation video generation

    Jingwei Shi, Zeyu Zhang, Biao Wu, Yanjie Liang, Meng Fang, Ling Chen, and Yang Zhao. PresentAgent: Multimodal agent for presentation video generation. In Ivan Habernal, Peter Schulam, and Jörg Tiedemann, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 760–773, Suzhou, China, Nove...

  35. [36]

    Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025

    Alexander Meier, Mahei Manhai Li, and Roman Rietsche. Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025

  36. [37]

    X. He, Y . Zhang, L. Wang, and Q. Liu. A survey on large language models for narrative visualization.arXiv preprint arXiv:2405.12345, 2024

  37. [38]

    https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026

    Microsoft 365 copilot | ai productivity tools for work. https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026. Accessed: 2026-02-26

  38. [39]

    Accessed: 2026-02-26

    Google gemini.https://gemini.google.com/app, 2026. Accessed: 2026-02-26

  39. [40]

    https://www.beautiful.ai/, 2026

    Best ai presentation maker for professional decks | beautiful.ai - generate high-quality slides with the artificial intelligence powered presentation tool available. https://www.beautiful.ai/, 2026. Accessed: 2026-02-26. 16

  40. [41]

    Rouge: A package for automatic evaluation of summaries

    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

  41. [42]

    Bleu: a method for automatic evaluation of machine translation

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

  42. [43]

    BERTScore: Evaluating Text Generation with BERT

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert.arXiv preprint arXiv:1904.09675, 2019

  43. [44]

    Pptc benchmark: Evaluating large language models for powerpoint task completion

    Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, and Nan Duan. Pptc benchmark: Evaluating large language models for powerpoint task completion. InFindings of the Association for Computational Linguistics: ACL 2024, pages 8682–8701, 2024

  44. [45]

    Researcher

    Ananth Muppidi, Tarak Das, Sambaran Bandyopadhyay, Tripti Shukla, et al. Taming llms with negative sam- ples: A reference-free framework to evaluate presentation content with actionable feedback.arXiv preprint arXiv:2505.18240, 2025. 17 Table of Contents for Appendices A System Details 19 B Evaluation 19 B.1 Baselines . . . . . . . . . . . . . . . . . . ....

  45. [46]

    Presentation Length (Duration)

  46. [47]

    Key Sections (Focus)

  47. [48]

    audience

    Style Preferences Interaction Style: ,→Be proactive. Read the paper content to guide the user. ,→Maintain context. Remember previous turns. ,→When the user provides new requirements, update your understanding. Completion Condition: When requirements are clear/confirmed, output the JSON strictly: { "audience": ""..."", "duration": ""..."", "focus_sections"...

  48. [49]

    Must include pipeline and put it at chosen[0]

  49. [50]

    Must include exactly one hook version (hook must be in chosen and must not be pipeline)

  50. [51]

    chosen": [

    Pick 2 additional distinct templates from the given pool. Return STRICT JSON ONLY: { "chosen": [""pipeline"", ""<hook>"", ""<other1>"", ""<other2>""], "hook": ""<hook>"", "reasons": {"template_id": ""one-line reason"", ...} } Listing D.1-3: System Prompt ofLogic Chain Generator Agent You are a logic-chain generator for a research paper presentation. You m...

  51. [52]

    PRIORITIZE USER INTENT: If ’User Requirements Context’ or ’total_duration’ implies a specific structure, you MUST follow it

  52. [53]

    TEMPLATES ARE REFERENCE ONLY: If a narrative template conflicts with user intent or duration constraints, you MUST compress/merge template roles to satisfy constraints

  53. [54]

    You MUST output between {min_nodes} and {max_nodes} nodes

    HARD CONSTRAINT: total_duration={duration_text}. You MUST output between {min_nodes} and {max_nodes} nodes

  54. [55]

    You may add extra roles: Hook, Takeaway, Extra

    roles should follow the focus_sections or user instructions. You may add extra roles: Hook, Takeaway, Extra

  55. [56]

    text must be the concise title of the node (<= 10 words)

  56. [57]

    description must be a detailed summary (2-3 sentences) combining paper content and user intent

  57. [58]

    type"=""sequential

    Edges: Create sequential edges ("type"=""sequential"") for the main flow. Do NOT create reference edges initially

  58. [59]

    edges": [ {

    LANGUAGE: Output node text and description in English. Listing D.1-4: System Prompt ofLogic Chain Edge Recommender You are a logic chain edge recommender assistant. Based on the given node list (ordered) and context abstract, recommend a set of directed edges and provide a short reason for each edge. Output STRICT JSON: { "edges": [ {"from": i, "to": j, "...

  59. [60]

    Return a JSON object where keys are Raw Section Names and values are the corresponding Logic Node Names

  60. [61]

    If a Raw Section implies or covers a Logic Node (even if wording differs), map it

  61. [62]

    If no match is found for a raw section, map it to null

  62. [63]

    Raw Name

    One Logic Node can be matched by multiple Raw Sections (e.g. splitting a section). Return JSON map: {"Raw Name": ""Logic Name""} D.2 Visual & Layout Generation Listing D.2-1: System Prompt ofDeck Style Agent You are a deck style director. Output ONE JSON object ONLY matching this schema: { "version": ""v1"", "persona": str, "theme": ""light""|""dark"", "p...

  63. [64]

    Search content using tools

  64. [65]

    Prefer \section{...}

    Use add_section(latex_cmd) for titles/structure. Prefer \section{...}

  65. [66]

    Ensure content is NOT a duplicate

    Use add_slide(latex_body, speech_script) to create content slides. Ensure content is NOT a duplicate

  66. [67]

    Use add_citation if needed

  67. [68]

    Keep within time limit

  68. [69]

    Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments

    Reply DONE when finished. Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments. Keep slides un-cluttered. ,→Content: KEY POINTS ONLY. Use bullet points. Avoid long paragraphs or walls of text. ,→Titles: ALWAYS use \frametitle{...} for every slide. ,→Figures/Tables: If a slide contains a figure or tab...

  69. [70]

    ALWAYS locate the most relevant source nodes with search_relevant_nodes, then call get_node_content

  70. [71]

    3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets)

    If the source node contains figures/tables, you MUST call get_node_media and include at least one representative figure/table. 3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets). Put detailed explanation into speech_script

  71. [72]

    If both a figure and a table are relevant, create separate slides for them

  72. [73]

    If you cannot fit the figure/table into a slide, mention it in the ‘speech_script‘ explicitly and optionally cite it

  73. [74]

    big title

    DO NOT create a "big title" inside the slide by using ‘Large/huge‘ text; instead use ‘add_section(’section{{...}}’)‘. 28 Listing D.2-4: System Prompt ofRender Plan Agent You are a senior research keynote slide designer. Task: output ONE single JSON object called a RenderPlan for ONE slide. All layout and content decisions must be made by you; no determini...

  74. [75]

    ,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles)

    Detect structural or visual failures in the slide, focusing on: ,→Missing or unused assets (hero images, table viz, diagrams). ,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles). ,→Clearly wrong layout choices (e.g., diagram_layout used when a main figure is available). ,→Empty or nearly empty content regions (no...

  75. [76]

    ,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec

    Propose a SMALL patch to the RenderPlan (partial JSON) that would fix the most important problems. ,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec. ,→Do NOT rewrite the actual content (title/core_message/bullets/steps) except when absolutely necessary

  76. [77]

    issues": [ {

    Provide optional notes_for_slide_agent that explains how future generations could avoid the same issue. IMPORTANT: ,→Output STRICT JSON only, matching the schema: { "issues": [ {"id": str, "severity": ""low""|""medium""|""high""|""critical"", "message": str, "hint": str, "location": object}, ... ], "suggested_plan_patch": object, "notes_for_slide_agent": ...

  77. [78]

    THINK: Analyze the error message and the context

  78. [79]

    NOTE: Log line numbers are often inaccurate for included files

    OBSERVE: Use grep_files, read_file, or check_balance to locate the error. NOTE: Log line numbers are often inaccurate for included files

  79. [80]

    ACT: Fix the error using the appropriate tools

  80. [81]

    SUCCESS", reply

    VERIFY: Call compile_pdf() to check if the error is resolved. If compile_pdf() returns "SUCCESS", reply "FIXED". If it fails, analyze the new error and repeat

Showing first 80 references.