DeepSlide: From Artifacts to Presentation Delivery

arxiv: 2605.15202 · v1 · submitted 2026-04-01 · 💻 cs.AI · cs.CL· cs.IR

DeepSlide: From Artifacts to Presentation Delivery

Ming Yang , Zhiwei Zhang , Jiahang Li , Haoseng Liu , Yuzheng Cai , Weiguo Zheng This is my paper

Pith reviewed 2026-05-19 17:58 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.IR

keywords AI presentation generationmulti-agent systemsnarrative planningslide-script generationpresentation deliverybenchmark evaluationhuman-in-the-loop AI

0 comments p. Extension

The pith

DeepSlide is a multi-agent system that plans time-budgeted narratives and generates synced slides and scripts to improve delivery while matching visual quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most AI slide tools focus on creating visually plausible decks but leave pacing, narrative structure, and script synergy to the user. DeepSlide addresses this by supporting the full process from requirement gathering through rehearsal, using a logical-chain planner that assigns time budgets to each narrative step. It adds a content retriever for grounding claims, sequential rendering that inherits styles, and sandboxed execution to keep outputs renderable. A new dual-scoreboard benchmark measures static slide quality separately from dynamic delivery aspects such as flow and attention guidance. Across twenty domains the system matches strong baselines on appearance yet shows larger gains on delivery metrics.

Core claim

DeepSlide is a human-in-the-loop multi-agent system that supports the full presentation process from requirement elicitation and time-budgeted narrative planning, to evidence-grounded slide-script generation, attention augmentation, and rehearsal support. It integrates a controllable logical-chain planner with per-node time budgets, a lightweight content-tree retriever for grounding, Markov-style sequential rendering with style inheritance, and sandboxed execution with minimal repair to ensure renderability. Evaluation on a dual-scoreboard benchmark across twenty domains shows it matches strong baselines on artifact quality while achieving larger gains on delivery metrics including narrative

What carries the argument

A controllable logical-chain planner with per-node time budgets that structures the narrative and enforces pacing precision during generation.

Load-bearing premise

The dual-scoreboard benchmark cleanly separates static artifact quality from dynamic delivery excellence without overlap or bias in the evaluation metrics.

What would settle it

A head-to-head user study in which independent raters score the same source content delivered by DeepSlide versus baseline generators and find no advantage or a reversal on narrative flow, pacing precision, or attention guidance scores.

Figures

Figures reproduced from arXiv: 2605.15202 by Haoseng Liu, Jiahang Li, Ming Yang, Weiguo Zheng, Yuzheng Cai, Zhiwei Zhang.

**Figure 2.** Figure 2: Limitations of existing approaches and the [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview. Stage 1: requirement elicitation and narrative proposal; Stage 2: logical chain editing and evidence-grounded generation; Stage 3: interactive slide refinement and attention-oriented augmentation; Stage 4: rehearsal and dual-scoreboard evaluation. Effects in Stage 3: Image Focus, Text to Diagram, Keynote, Data Visualization, Motion, Background, Auto Layout (Qwen3.5 [24] as example). 2.1 Overview … view at source ↗

**Figure 4.** Figure 4: Requirement elicitation and narrative proposal (Stage 1). [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 7.** Figure 7: Secondary experiment on audience-specific evaluation. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: DeepSlide vs. Manus, varying audience {BS, MS, PhD} and duration {5, 10, 15} (Case 1). Case 2: Does DeepSlide reduce user burden and delivery pressure? [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Varying retrieval depth. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: DeepSlide’s UI and attention augmented effect preview. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗

read the original abstract

Presentations are a primary medium for scholarly communication, yet most AI slide generators optimize the artifact (a visually plausible deck) while under-optimizing the delivery process (pacing, narrative, and presentation preparation). We present DeepSlide, a human-in-the-loop multi-agent system that supports preparing the full presentation process, from requirement elicitation and time-budgeted narrative planning, to evidence-grounded slide--script generation, attention augmentation, and rehearsal support. DeepSlide integrates (i) a controllable logical-chain planner with per-node time budgets, (ii) a lightweight content-tree retriever for grounding, (iii) Markov-style sequential rendering with style inheritance, and (iv) sandboxed execution with minimal repair to ensure renderability. We further introduce a dual-scoreboard benchmark that cleanly separates static artifact quality from dynamic delivery excellence. Across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while consistently achieving larger gains on delivery metrics, improving narrative flow, pacing precision, and slide--script synergy with clearer attention guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeepSlide shifts slide generation toward full delivery support with time-budgeted planning and a dual-scoreboard benchmark, but the abstract supplies no numbers or validation to back the performance claims.

read the letter

The main point is that DeepSlide tries to handle the full presentation process rather than stopping at slide creation, but the abstract gives no numbers to show it works better. It brings together a logical chain planner that assigns time budgets to parts of the narrative, a retriever for grounding content, sequential rendering, and support for rehearsal. The dual-scoreboard benchmark is meant to measure both how good the slides look and how well the delivery goes, like pacing and how the script matches the slides. This setup is new in combining those pieces with explicit attention to delivery metrics and human-in-the-loop elements. It does a solid job describing a practical pipeline that could help people prepare talks more effectively. The weak part is the lack of any actual performance numbers, baseline comparisons, or details on how the benchmark was tested. The claim of larger gains on delivery across 20 domains sounds promising but can't be checked without the data. There's also a risk that the delivery scores are not fully independent from the artifact quality, since better slides might naturally lead to better pacing. This paper would interest people who build or use AI tools for creating presentations in academic or professional settings. A reader working on similar systems could pick up ideas on planning with time constraints or evaluation methods. It has enough of a new angle to deserve a serious referee who can look at the full experiments and methods. I would send it to peer review but ask the authors to add the missing quantitative results and any tests for metric independence right away.

Referee Report

1 major / 1 minor

Summary. The manuscript presents DeepSlide, a human-in-the-loop multi-agent system for full presentation preparation, integrating a controllable logical-chain planner with per-node time budgets, a lightweight content-tree retriever for grounding, Markov-style sequential rendering with style inheritance, and sandboxed execution for renderability. It introduces a dual-scoreboard benchmark to separate static artifact quality from dynamic delivery excellence and claims that, across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while achieving larger gains on delivery metrics including narrative flow, pacing precision, slide-script synergy, and attention guidance.

Significance. If the empirical results hold under a validated benchmark, the work could meaningfully advance AI-assisted scholarly communication by shifting focus from static slide artifacts to the full delivery process. The integration of planning, retrieval, and rehearsal components represents a practical step toward more usable presentation tools, and the dual-scoreboard idea, if shown to be non-confounded, would be a useful methodological contribution for future evaluations in this area.

major comments (1)

[Benchmark and Evaluation] The central empirical claim—that DeepSlide matches baselines on artifact quality but shows larger gains on delivery metrics—depends on the dual-scoreboard benchmark cleanly isolating static slide quality from dynamic aspects without overlap or bias. The manuscript provides no description of metric definitions, orthogonality tests between the two scoreboards, or controls for potential leakage (e.g., clearer artifact slides enabling better pacing and synergy by construction). This is load-bearing for interpreting the differential-gain result and must be addressed with explicit validation.

minor comments (1)

[Abstract] The abstract states performance gains but does not include any quantitative results, baseline details, or statistical tests; moving a concise summary of key numbers into the abstract would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of DeepSlide to advance AI-assisted scholarly communication by emphasizing the full delivery process. We address the major comment on the dual-scoreboard benchmark below.

read point-by-point responses

Referee: The central empirical claim—that DeepSlide matches baselines on artifact quality but shows larger gains on delivery metrics—depends on the dual-scoreboard benchmark cleanly isolating static slide quality from dynamic aspects without overlap or bias. The manuscript provides no description of metric definitions, orthogonality tests between the two scoreboards, or controls for potential leakage (e.g., clearer artifact slides enabling better pacing and synergy by construction). This is load-bearing for interpreting the differential-gain result and must be addressed with explicit validation.

Authors: We agree that explicit validation of the separation is essential to support the differential-gain interpretation. In the revised manuscript we will expand the benchmark description (currently in Section 4) with (i) precise definitions and scoring procedures for every metric on both the artifact and delivery scoreboards, (ii) quantitative orthogonality analysis (Pearson and Spearman correlations across the 20 domains), and (iii) leakage-control experiments that evaluate delivery metrics on fixed baseline artifacts and artifact metrics on fixed scripts. These additions will directly address the concern about confounding and provide the requested explicit validation. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The manuscript presents an engineering system (multi-agent pipeline with planner, retriever, sequential rendering, and sandboxed execution) plus an introduced dual-scoreboard benchmark. No equations, fitted parameters, or derivations appear in the provided text. The central empirical claim rests on cross-domain comparisons rather than any quantity that reduces by construction to its own inputs or to a self-citation chain. The benchmark's separation of artifact versus delivery metrics is asserted but not shown to be tautological; it functions as an external evaluation protocol whose validity is open to independent verification. This is the normal case of a self-contained systems paper whose claims do not collapse into definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities beyond the system components and benchmark are described. The dual-scoreboard benchmark is treated as a new evaluation construct without independent validation details.

invented entities (1)

dual-scoreboard benchmark no independent evidence
purpose: To cleanly separate static artifact quality from dynamic delivery excellence in evaluation
Introduced as a new measurement approach in the abstract to support claims of larger gains on delivery metrics.

pith-pipeline@v0.9.0 · 5723 in / 1255 out tokens · 52526 ms · 2026-05-19T17:58:19.195575+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

94 extracted references · 94 canonical work pages · 1 internal anchor

[1]

Alan Kelly.How Scientists Communicate: Dispatches from the Frontiers of KnowledgeDispatches from the Frontiers of Knowledge. 09 2020. ISBN 9780190936600. doi: 10.1093/oso/9780190936600.001.0001

work page doi:10.1093/oso/9780190936600.001.0001 2020
[2]

The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004

Michael Alley. The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004. doi: 10.1063/1.1784305

work page doi:10.1063/1.1784305 2004
[3]

Autopresent: Designing structured visuals from scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, et al. Autopresent: Designing structured visuals from scratch. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2902–2911, 2025

work page 2025
[4]

Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024

Himanshu Maheshwari, Sambaran Bandyopadhyay, Aparna Garimella, and Anandhavelu Natarajan. Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024

work page arXiv 2024
[5]

Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents. In Yvette Graham and Matthew Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for...

work page doi:10.18653/v1/2024.eacl-long.163 2024
[6]

https://github.com/langchain-ai/langchain,

langchain-ai/langchain: The platform for reliable agents. https://github.com/langchain-ai/langchain,

work page
[8]

our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026

Significant-gravitas/autogpt: Autogpt is the vision of accessible ai for everyone, to use and to build on. our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026. Accessed: 2026-02-26

work page 2026
[9]

https://github.com/microsoft/autogen,

microsoft/autogen: A programming framework for agentic ai. https://github.com/microsoft/autogen,

work page
[10]

Accessed: 2026-02-26

work page 2026
[11]

Camel: Communicative agents for "mind" exploration of large language model society

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for "mind" exploration of large language model society. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[12]

MetaGPT: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmid- huber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 2...

work page 2024
[13]

Accessed: 2026-02-26

Manus.https://manus.im/app, 2026. Accessed: 2026-02-26

work page 2026
[14]

Accessed: 2026-02-26

Gamma.https://gamma.app/, 2026. Accessed: 2026-02-26

work page 2026
[15]

Accessed: 2026-02-26

Google notebooklm.https://notebooklm.google/, 2026. Accessed: 2026-02-26

work page 2026
[16]

Accessed: 2026-02-26

Qwen.https://qwen.ai/home, 2026. Accessed: 2026-02-26

work page 2026
[17]

Accessed: 2026-02-26

Coze: Next-gen ai app developing platform.https://www.coze.com/, 2026. Accessed: 2026-02-26

work page 2026
[18]

PPTAgent: Generating and evaluating presentations beyond text-to-slides

Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. PPTAgent: Generating and evaluating presentations beyond text-to-slides. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Lang...

work page doi:10.18653/v1/2025 2025
[19]

Mayer.Coherence Principle, page 113–133

Richard E. Mayer.Coherence Principle, page 113–133. Cambridge University Press, 2001

work page 2001
[20]

Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025

Tushar Aggarwal and Aarohi Bhand. Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025. 15

work page arXiv 2025
[21]

Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025

Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, and Ying-Cong Chen. Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025

work page arXiv 2025
[22]

Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024

Isabel Cachola, Silviu Cucerzan, Allen Herring, Vuksan Mijovic, Erik Oveson, and Sujay Kumar Jauhar. Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024

work page arXiv 2024
[23]

Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026

Yunqiao Yang, Wenbo Li, Houxing Ren, Zimu Lu, Ke Wang, Zhiyuan Huang, Zhuofan Zong, Mingjie Zhan, and Hongsheng Li. Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026

work page arXiv 2026
[24]

Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025

Michael Ofengenden, Yunze Man, Ziqi Pang, and Yu-Xiong Wang. Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025

work page arXiv 2025
[25]

Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025

Zheng Huang, Xukai Liu, Tianyu Hu, Kai Zhang, and Ye Liu. Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025

work page arXiv 2025
[26]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id= qwen3.5

work page 2026
[27]

Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025

Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, and Jingchen Shu. Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025

work page arXiv 2025
[28]

this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026

Dayuanjiang/next-ai-draw-io: A next.js web application that integrates ai capabilities with draw.io diagrams. this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026. Accessed: 2026-02-27

work page 2026
[29]

Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018

Deqing Li, Honghui Mei, Yi Shen, Shuang Su, Wenli Zhang, Junting Wang, Ming Zu, and Wei Chen. Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. ISSN 2468-502X. doi: https://doi.org/10.1016/j.visinf.2018.04.011. URL https://www.sciencedirect.com/ science/article/pii/S2468502X18300068

work page doi:10.1016/j.visinf.2018.04.011 2018
[30]

https://deepmind.google/models/gemini-image/,

Gemini image – nano banana — google deepmind. https://deepmind.google/models/gemini-image/,

work page
[31]

Accessed: 2026-02-27

work page 2026
[32]

Doc2ppt: Automatic presentation slides generation from scientific documents

Tsu-Jui Fu, William Yang Wang, Daniel McDuff, and Yale Song. Doc2ppt: Automatic presentation slides generation from scientific documents. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 634–642, 2022

work page 2022
[33]

Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024

Keshav Kumar and Ravindranath Chowdary. Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024

work page arXiv 2024
[34]

Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025

Yuheng Yang, Wenjia Jiang, Yang Wang, Yiwei Wang, and Chi Zhang. Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025

work page arXiv 2025
[35]

PresentAgent: Multimodal agent for presentation video generation

Jingwei Shi, Zeyu Zhang, Biao Wu, Yanjie Liang, Meng Fang, Ling Chen, and Yang Zhao. PresentAgent: Multimodal agent for presentation video generation. In Ivan Habernal, Peter Schulam, and Jörg Tiedemann, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 760–773, Suzhou, China, Nove...

work page doi:10.18653/v1/2025.emnlp-demos.58 2025
[36]

Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025

Alexander Meier, Mahei Manhai Li, and Roman Rietsche. Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025

work page 2025
[37]

X. He, Y . Zhang, L. Wang, and Q. Liu. A survey on large language models for narrative visualization.arXiv preprint arXiv:2405.12345, 2024

work page arXiv 2024
[38]

https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026

Microsoft 365 copilot | ai productivity tools for work. https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026. Accessed: 2026-02-26

work page 2026
[39]

Accessed: 2026-02-26

Google gemini.https://gemini.google.com/app, 2026. Accessed: 2026-02-26

work page 2026
[40]

https://www.beautiful.ai/, 2026

Best ai presentation maker for professional decks | beautiful.ai - generate high-quality slides with the artificial intelligence powered presentation tool available. https://www.beautiful.ai/, 2026. Accessed: 2026-02-26. 16

work page 2026
[41]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

work page 2004
[42]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

work page 2002
[43]

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert.arXiv preprint arXiv:1904.09675, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[44]

Pptc benchmark: Evaluating large language models for powerpoint task completion

Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, and Nan Duan. Pptc benchmark: Evaluating large language models for powerpoint task completion. InFindings of the Association for Computational Linguistics: ACL 2024, pages 8682–8701, 2024

work page 2024
[45]

Researcher

Ananth Muppidi, Tarak Das, Sambaran Bandyopadhyay, Tripti Shukla, et al. Taming llms with negative sam- ples: A reference-free framework to evaluate presentation content with actionable feedback.arXiv preprint arXiv:2505.18240, 2025. 17 Table of Contents for Appendices A System Details 19 B Evaluation 19 B.1 Baselines . . . . . . . . . . . . . . . . . . ....

work page arXiv 2025
[46]

Presentation Length (Duration)

work page
[47]

Key Sections (Focus)

work page
[48]

audience

Style Preferences Interaction Style: ,→Be proactive. Read the paper content to guide the user. ,→Maintain context. Remember previous turns. ,→When the user provides new requirements, update your understanding. Completion Condition: When requirements are clear/confirmed, output the JSON strictly: { "audience": ""..."", "duration": ""..."", "focus_sections"...

work page
[49]

Must include pipeline and put it at chosen[0]

work page
[50]

Must include exactly one hook version (hook must be in chosen and must not be pipeline)

work page
[51]

chosen": [

Pick 2 additional distinct templates from the given pool. Return STRICT JSON ONLY: { "chosen": [""pipeline"", ""<hook>"", ""<other1>"", ""<other2>""], "hook": ""<hook>"", "reasons": {"template_id": ""one-line reason"", ...} } Listing D.1-3: System Prompt ofLogic Chain Generator Agent You are a logic-chain generator for a research paper presentation. You m...

work page
[52]

PRIORITIZE USER INTENT: If ’User Requirements Context’ or ’total_duration’ implies a specific structure, you MUST follow it

work page
[53]

TEMPLATES ARE REFERENCE ONLY: If a narrative template conflicts with user intent or duration constraints, you MUST compress/merge template roles to satisfy constraints

work page
[54]

You MUST output between {min_nodes} and {max_nodes} nodes

HARD CONSTRAINT: total_duration={duration_text}. You MUST output between {min_nodes} and {max_nodes} nodes

work page
[55]

You may add extra roles: Hook, Takeaway, Extra

roles should follow the focus_sections or user instructions. You may add extra roles: Hook, Takeaway, Extra

work page
[56]

text must be the concise title of the node (<= 10 words)

work page
[57]

description must be a detailed summary (2-3 sentences) combining paper content and user intent

work page
[58]

type"=""sequential

Edges: Create sequential edges ("type"=""sequential"") for the main flow. Do NOT create reference edges initially

work page
[59]

edges": [ {

LANGUAGE: Output node text and description in English. Listing D.1-4: System Prompt ofLogic Chain Edge Recommender You are a logic chain edge recommender assistant. Based on the given node list (ordered) and context abstract, recommend a set of directed edges and provide a short reason for each edge. Output STRICT JSON: { "edges": [ {"from": i, "to": j, "...

work page
[60]

Return a JSON object where keys are Raw Section Names and values are the corresponding Logic Node Names

work page
[61]

If a Raw Section implies or covers a Logic Node (even if wording differs), map it

work page
[62]

If no match is found for a raw section, map it to null

work page
[63]

Raw Name

One Logic Node can be matched by multiple Raw Sections (e.g. splitting a section). Return JSON map: {"Raw Name": ""Logic Name""} D.2 Visual & Layout Generation Listing D.2-1: System Prompt ofDeck Style Agent You are a deck style director. Output ONE JSON object ONLY matching this schema: { "version": ""v1"", "persona": str, "theme": ""light""|""dark"", "p...

work page
[64]

Search content using tools

work page
[65]

Prefer \section{...}

Use add_section(latex_cmd) for titles/structure. Prefer \section{...}

work page
[66]

Ensure content is NOT a duplicate

Use add_slide(latex_body, speech_script) to create content slides. Ensure content is NOT a duplicate

work page
[67]

Use add_citation if needed

work page
[68]

Keep within time limit

work page
[69]

Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments

Reply DONE when finished. Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments. Keep slides un-cluttered. ,→Content: KEY POINTS ONLY. Use bullet points. Avoid long paragraphs or walls of text. ,→Titles: ALWAYS use \frametitle{...} for every slide. ,→Figures/Tables: If a slide contains a figure or tab...

work page
[70]

ALWAYS locate the most relevant source nodes with search_relevant_nodes, then call get_node_content

work page
[71]

3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets)

If the source node contains figures/tables, you MUST call get_node_media and include at least one representative figure/table. 3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets). Put detailed explanation into speech_script

work page
[72]

If both a figure and a table are relevant, create separate slides for them

work page
[73]

If you cannot fit the figure/table into a slide, mention it in the ‘speech_script‘ explicitly and optionally cite it

work page
[74]

big title

DO NOT create a "big title" inside the slide by using ‘Large/huge‘ text; instead use ‘add_section(’section{{...}}’)‘. 28 Listing D.2-4: System Prompt ofRender Plan Agent You are a senior research keynote slide designer. Task: output ONE single JSON object called a RenderPlan for ONE slide. All layout and content decisions must be made by you; no determini...

work page
[75]

,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles)

Detect structural or visual failures in the slide, focusing on: ,→Missing or unused assets (hero images, table viz, diagrams). ,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles). ,→Clearly wrong layout choices (e.g., diagram_layout used when a main figure is available). ,→Empty or nearly empty content regions (no...

work page
[76]

,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec

Propose a SMALL patch to the RenderPlan (partial JSON) that would fix the most important problems. ,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec. ,→Do NOT rewrite the actual content (title/core_message/bullets/steps) except when absolutely necessary

work page
[77]

issues": [ {

Provide optional notes_for_slide_agent that explains how future generations could avoid the same issue. IMPORTANT: ,→Output STRICT JSON only, matching the schema: { "issues": [ {"id": str, "severity": ""low""|""medium""|""high""|""critical"", "message": str, "hint": str, "location": object}, ... ], "suggested_plan_patch": object, "notes_for_slide_agent": ...

work page
[78]

THINK: Analyze the error message and the context

work page
[79]

NOTE: Log line numbers are often inaccurate for included files

OBSERVE: Use grep_files, read_file, or check_balance to locate the error. NOTE: Log line numbers are often inaccurate for included files

work page
[80]

ACT: Fix the error using the appropriate tools

work page
[81]

SUCCESS", reply

VERIFY: Call compile_pdf() to check if the error is resolved. If compile_pdf() returns "SUCCESS", reply "FIXED". If it fails, analyze the new error and repeat

work page

Showing first 80 references.

[1] [1]

Alan Kelly.How Scientists Communicate: Dispatches from the Frontiers of KnowledgeDispatches from the Frontiers of Knowledge. 09 2020. ISBN 9780190936600. doi: 10.1093/oso/9780190936600.001.0001

work page doi:10.1093/oso/9780190936600.001.0001 2020

[2] [2]

The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004

Michael Alley. The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004. doi: 10.1063/1.1784305

work page doi:10.1063/1.1784305 2004

[3] [3]

Autopresent: Designing structured visuals from scratch

Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, et al. Autopresent: Designing structured visuals from scratch. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2902–2911, 2025

work page 2025

[4] [4]

Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024

Himanshu Maheshwari, Sambaran Bandyopadhyay, Aparna Garimella, and Anandhavelu Natarajan. Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024

work page arXiv 2024

[5] [5]

Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents. In Yvette Graham and Matthew Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for...

work page doi:10.18653/v1/2024.eacl-long.163 2024

[6] [6]

https://github.com/langchain-ai/langchain,

langchain-ai/langchain: The platform for reliable agents. https://github.com/langchain-ai/langchain,

work page

[7] [8]

our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026

Significant-gravitas/autogpt: Autogpt is the vision of accessible ai for everyone, to use and to build on. our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026. Accessed: 2026-02-26

work page 2026

[8] [9]

https://github.com/microsoft/autogen,

microsoft/autogen: A programming framework for agentic ai. https://github.com/microsoft/autogen,

work page

[9] [10]

Accessed: 2026-02-26

work page 2026

[10] [11]

Camel: Communicative agents for "mind" exploration of large language model society

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for "mind" exploration of large language model society. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[11] [12]

MetaGPT: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmid- huber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 2...

work page 2024

[12] [13]

Accessed: 2026-02-26

Manus.https://manus.im/app, 2026. Accessed: 2026-02-26

work page 2026

[13] [14]

Accessed: 2026-02-26

Gamma.https://gamma.app/, 2026. Accessed: 2026-02-26

work page 2026

[14] [15]

Accessed: 2026-02-26

Google notebooklm.https://notebooklm.google/, 2026. Accessed: 2026-02-26

work page 2026

[15] [16]

Accessed: 2026-02-26

Qwen.https://qwen.ai/home, 2026. Accessed: 2026-02-26

work page 2026

[16] [17]

Accessed: 2026-02-26

Coze: Next-gen ai app developing platform.https://www.coze.com/, 2026. Accessed: 2026-02-26

work page 2026

[17] [18]

PPTAgent: Generating and evaluating presentations beyond text-to-slides

Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. PPTAgent: Generating and evaluating presentations beyond text-to-slides. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Lang...

work page doi:10.18653/v1/2025 2025

[18] [19]

Mayer.Coherence Principle, page 113–133

Richard E. Mayer.Coherence Principle, page 113–133. Cambridge University Press, 2001

work page 2001

[19] [20]

Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025

Tushar Aggarwal and Aarohi Bhand. Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025. 15

work page arXiv 2025

[20] [21]

Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025

Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, and Ying-Cong Chen. Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025

work page arXiv 2025

[21] [22]

Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024

Isabel Cachola, Silviu Cucerzan, Allen Herring, Vuksan Mijovic, Erik Oveson, and Sujay Kumar Jauhar. Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024

work page arXiv 2024

[22] [23]

Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026

Yunqiao Yang, Wenbo Li, Houxing Ren, Zimu Lu, Ke Wang, Zhiyuan Huang, Zhuofan Zong, Mingjie Zhan, and Hongsheng Li. Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026

work page arXiv 2026

[23] [24]

Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025

Michael Ofengenden, Yunze Man, Ziqi Pang, and Yu-Xiong Wang. Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025

work page arXiv 2025

[24] [25]

Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025

Zheng Huang, Xukai Liu, Tianyu Hu, Kai Zhang, and Ye Liu. Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025

work page arXiv 2025

[25] [26]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id= qwen3.5

work page 2026

[26] [27]

Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025

Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, and Jingchen Shu. Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025

work page arXiv 2025

[27] [28]

this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026

Dayuanjiang/next-ai-draw-io: A next.js web application that integrates ai capabilities with draw.io diagrams. this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026. Accessed: 2026-02-27

work page 2026

[28] [29]

Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018

Deqing Li, Honghui Mei, Yi Shen, Shuang Su, Wenli Zhang, Junting Wang, Ming Zu, and Wei Chen. Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. ISSN 2468-502X. doi: https://doi.org/10.1016/j.visinf.2018.04.011. URL https://www.sciencedirect.com/ science/article/pii/S2468502X18300068

work page doi:10.1016/j.visinf.2018.04.011 2018

[29] [30]

https://deepmind.google/models/gemini-image/,

Gemini image – nano banana — google deepmind. https://deepmind.google/models/gemini-image/,

work page

[30] [31]

Accessed: 2026-02-27

work page 2026

[31] [32]

Doc2ppt: Automatic presentation slides generation from scientific documents

Tsu-Jui Fu, William Yang Wang, Daniel McDuff, and Yale Song. Doc2ppt: Automatic presentation slides generation from scientific documents. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 634–642, 2022

work page 2022

[32] [33]

Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024

Keshav Kumar and Ravindranath Chowdary. Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024

work page arXiv 2024

[33] [34]

Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025

Yuheng Yang, Wenjia Jiang, Yang Wang, Yiwei Wang, and Chi Zhang. Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025

work page arXiv 2025

[34] [35]

PresentAgent: Multimodal agent for presentation video generation

Jingwei Shi, Zeyu Zhang, Biao Wu, Yanjie Liang, Meng Fang, Ling Chen, and Yang Zhao. PresentAgent: Multimodal agent for presentation video generation. In Ivan Habernal, Peter Schulam, and Jörg Tiedemann, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 760–773, Suzhou, China, Nove...

work page doi:10.18653/v1/2025.emnlp-demos.58 2025

[35] [36]

Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025

Alexander Meier, Mahei Manhai Li, and Roman Rietsche. Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025

work page 2025

[36] [37]

X. He, Y . Zhang, L. Wang, and Q. Liu. A survey on large language models for narrative visualization.arXiv preprint arXiv:2405.12345, 2024

work page arXiv 2024

[37] [38]

https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026

Microsoft 365 copilot | ai productivity tools for work. https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026. Accessed: 2026-02-26

work page 2026

[38] [39]

Accessed: 2026-02-26

Google gemini.https://gemini.google.com/app, 2026. Accessed: 2026-02-26

work page 2026

[39] [40]

https://www.beautiful.ai/, 2026

Best ai presentation maker for professional decks | beautiful.ai - generate high-quality slides with the artificial intelligence powered presentation tool available. https://www.beautiful.ai/, 2026. Accessed: 2026-02-26. 16

work page 2026

[40] [41]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

work page 2004

[41] [42]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

work page 2002

[42] [43]

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert.arXiv preprint arXiv:1904.09675, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[43] [44]

Pptc benchmark: Evaluating large language models for powerpoint task completion

Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, and Nan Duan. Pptc benchmark: Evaluating large language models for powerpoint task completion. InFindings of the Association for Computational Linguistics: ACL 2024, pages 8682–8701, 2024

work page 2024

[44] [45]

Researcher

Ananth Muppidi, Tarak Das, Sambaran Bandyopadhyay, Tripti Shukla, et al. Taming llms with negative sam- ples: A reference-free framework to evaluate presentation content with actionable feedback.arXiv preprint arXiv:2505.18240, 2025. 17 Table of Contents for Appendices A System Details 19 B Evaluation 19 B.1 Baselines . . . . . . . . . . . . . . . . . . ....

work page arXiv 2025

[45] [46]

Presentation Length (Duration)

work page

[46] [47]

Key Sections (Focus)

work page

[47] [48]

audience

Style Preferences Interaction Style: ,→Be proactive. Read the paper content to guide the user. ,→Maintain context. Remember previous turns. ,→When the user provides new requirements, update your understanding. Completion Condition: When requirements are clear/confirmed, output the JSON strictly: { "audience": ""..."", "duration": ""..."", "focus_sections"...

work page

[48] [49]

Must include pipeline and put it at chosen[0]

work page

[49] [50]

Must include exactly one hook version (hook must be in chosen and must not be pipeline)

work page

[50] [51]

chosen": [

Pick 2 additional distinct templates from the given pool. Return STRICT JSON ONLY: { "chosen": [""pipeline"", ""<hook>"", ""<other1>"", ""<other2>""], "hook": ""<hook>"", "reasons": {"template_id": ""one-line reason"", ...} } Listing D.1-3: System Prompt ofLogic Chain Generator Agent You are a logic-chain generator for a research paper presentation. You m...

work page

[51] [52]

PRIORITIZE USER INTENT: If ’User Requirements Context’ or ’total_duration’ implies a specific structure, you MUST follow it

work page

[52] [53]

TEMPLATES ARE REFERENCE ONLY: If a narrative template conflicts with user intent or duration constraints, you MUST compress/merge template roles to satisfy constraints

work page

[53] [54]

You MUST output between {min_nodes} and {max_nodes} nodes

HARD CONSTRAINT: total_duration={duration_text}. You MUST output between {min_nodes} and {max_nodes} nodes

work page

[54] [55]

You may add extra roles: Hook, Takeaway, Extra

roles should follow the focus_sections or user instructions. You may add extra roles: Hook, Takeaway, Extra

work page

[55] [56]

text must be the concise title of the node (<= 10 words)

work page

[56] [57]

description must be a detailed summary (2-3 sentences) combining paper content and user intent

work page

[57] [58]

type"=""sequential

Edges: Create sequential edges ("type"=""sequential"") for the main flow. Do NOT create reference edges initially

work page

[58] [59]

edges": [ {

LANGUAGE: Output node text and description in English. Listing D.1-4: System Prompt ofLogic Chain Edge Recommender You are a logic chain edge recommender assistant. Based on the given node list (ordered) and context abstract, recommend a set of directed edges and provide a short reason for each edge. Output STRICT JSON: { "edges": [ {"from": i, "to": j, "...

work page

[59] [60]

Return a JSON object where keys are Raw Section Names and values are the corresponding Logic Node Names

work page

[60] [61]

If a Raw Section implies or covers a Logic Node (even if wording differs), map it

work page

[61] [62]

If no match is found for a raw section, map it to null

work page

[62] [63]

Raw Name

One Logic Node can be matched by multiple Raw Sections (e.g. splitting a section). Return JSON map: {"Raw Name": ""Logic Name""} D.2 Visual & Layout Generation Listing D.2-1: System Prompt ofDeck Style Agent You are a deck style director. Output ONE JSON object ONLY matching this schema: { "version": ""v1"", "persona": str, "theme": ""light""|""dark"", "p...

work page

[63] [64]

Search content using tools

work page

[64] [65]

Prefer \section{...}

Use add_section(latex_cmd) for titles/structure. Prefer \section{...}

work page

[65] [66]

Ensure content is NOT a duplicate

Use add_slide(latex_body, speech_script) to create content slides. Ensure content is NOT a duplicate

work page

[66] [67]

Use add_citation if needed

work page

[67] [68]

Keep within time limit

work page

[68] [69]

Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments

Reply DONE when finished. Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments. Keep slides un-cluttered. ,→Content: KEY POINTS ONLY. Use bullet points. Avoid long paragraphs or walls of text. ,→Titles: ALWAYS use \frametitle{...} for every slide. ,→Figures/Tables: If a slide contains a figure or tab...

work page

[69] [70]

ALWAYS locate the most relevant source nodes with search_relevant_nodes, then call get_node_content

work page

[70] [71]

3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets)

If the source node contains figures/tables, you MUST call get_node_media and include at least one representative figure/table. 3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets). Put detailed explanation into speech_script

work page

[71] [72]

If both a figure and a table are relevant, create separate slides for them

work page

[72] [73]

If you cannot fit the figure/table into a slide, mention it in the ‘speech_script‘ explicitly and optionally cite it

work page

[73] [74]

big title

DO NOT create a "big title" inside the slide by using ‘Large/huge‘ text; instead use ‘add_section(’section{{...}}’)‘. 28 Listing D.2-4: System Prompt ofRender Plan Agent You are a senior research keynote slide designer. Task: output ONE single JSON object called a RenderPlan for ONE slide. All layout and content decisions must be made by you; no determini...

work page

[74] [75]

,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles)

Detect structural or visual failures in the slide, focusing on: ,→Missing or unused assets (hero images, table viz, diagrams). ,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles). ,→Clearly wrong layout choices (e.g., diagram_layout used when a main figure is available). ,→Empty or nearly empty content regions (no...

work page

[75] [76]

,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec

Propose a SMALL patch to the RenderPlan (partial JSON) that would fix the most important problems. ,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec. ,→Do NOT rewrite the actual content (title/core_message/bullets/steps) except when absolutely necessary

work page

[76] [77]

issues": [ {

Provide optional notes_for_slide_agent that explains how future generations could avoid the same issue. IMPORTANT: ,→Output STRICT JSON only, matching the schema: { "issues": [ {"id": str, "severity": ""low""|""medium""|""high""|""critical"", "message": str, "hint": str, "location": object}, ... ], "suggested_plan_patch": object, "notes_for_slide_agent": ...

work page

[77] [78]

THINK: Analyze the error message and the context

work page

[78] [79]

NOTE: Log line numbers are often inaccurate for included files

OBSERVE: Use grep_files, read_file, or check_balance to locate the error. NOTE: Log line numbers are often inaccurate for included files

work page

[79] [80]

ACT: Fix the error using the appropriate tools

work page

[80] [81]

SUCCESS", reply

VERIFY: Call compile_pdf() to check if the error is resolved. If compile_pdf() returns "SUCCESS", reply "FIXED". If it fails, analyze the new error and repeat

work page