DeepSlide: From Artifacts to Presentation Delivery
Pith reviewed 2026-05-19 17:58 UTC · model grok-4.3
The pith
DeepSlide is a multi-agent system that plans time-budgeted narratives and generates synced slides and scripts to improve delivery while matching visual quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeepSlide is a human-in-the-loop multi-agent system that supports the full presentation process from requirement elicitation and time-budgeted narrative planning, to evidence-grounded slide-script generation, attention augmentation, and rehearsal support. It integrates a controllable logical-chain planner with per-node time budgets, a lightweight content-tree retriever for grounding, Markov-style sequential rendering with style inheritance, and sandboxed execution with minimal repair to ensure renderability. Evaluation on a dual-scoreboard benchmark across twenty domains shows it matches strong baselines on artifact quality while achieving larger gains on delivery metrics including narrative
What carries the argument
A controllable logical-chain planner with per-node time budgets that structures the narrative and enforces pacing precision during generation.
Load-bearing premise
The dual-scoreboard benchmark cleanly separates static artifact quality from dynamic delivery excellence without overlap or bias in the evaluation metrics.
What would settle it
A head-to-head user study in which independent raters score the same source content delivered by DeepSlide versus baseline generators and find no advantage or a reversal on narrative flow, pacing precision, or attention guidance scores.
Figures
read the original abstract
Presentations are a primary medium for scholarly communication, yet most AI slide generators optimize the artifact (a visually plausible deck) while under-optimizing the delivery process (pacing, narrative, and presentation preparation). We present DeepSlide, a human-in-the-loop multi-agent system that supports preparing the full presentation process, from requirement elicitation and time-budgeted narrative planning, to evidence-grounded slide--script generation, attention augmentation, and rehearsal support. DeepSlide integrates (i) a controllable logical-chain planner with per-node time budgets, (ii) a lightweight content-tree retriever for grounding, (iii) Markov-style sequential rendering with style inheritance, and (iv) sandboxed execution with minimal repair to ensure renderability. We further introduce a dual-scoreboard benchmark that cleanly separates static artifact quality from dynamic delivery excellence. Across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while consistently achieving larger gains on delivery metrics, improving narrative flow, pacing precision, and slide--script synergy with clearer attention guidance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents DeepSlide, a human-in-the-loop multi-agent system for full presentation preparation, integrating a controllable logical-chain planner with per-node time budgets, a lightweight content-tree retriever for grounding, Markov-style sequential rendering with style inheritance, and sandboxed execution for renderability. It introduces a dual-scoreboard benchmark to separate static artifact quality from dynamic delivery excellence and claims that, across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while achieving larger gains on delivery metrics including narrative flow, pacing precision, slide-script synergy, and attention guidance.
Significance. If the empirical results hold under a validated benchmark, the work could meaningfully advance AI-assisted scholarly communication by shifting focus from static slide artifacts to the full delivery process. The integration of planning, retrieval, and rehearsal components represents a practical step toward more usable presentation tools, and the dual-scoreboard idea, if shown to be non-confounded, would be a useful methodological contribution for future evaluations in this area.
major comments (1)
- [Benchmark and Evaluation] The central empirical claim—that DeepSlide matches baselines on artifact quality but shows larger gains on delivery metrics—depends on the dual-scoreboard benchmark cleanly isolating static slide quality from dynamic aspects without overlap or bias. The manuscript provides no description of metric definitions, orthogonality tests between the two scoreboards, or controls for potential leakage (e.g., clearer artifact slides enabling better pacing and synergy by construction). This is load-bearing for interpreting the differential-gain result and must be addressed with explicit validation.
minor comments (1)
- [Abstract] The abstract states performance gains but does not include any quantitative results, baseline details, or statistical tests; moving a concise summary of key numbers into the abstract would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of DeepSlide to advance AI-assisted scholarly communication by emphasizing the full delivery process. We address the major comment on the dual-scoreboard benchmark below.
read point-by-point responses
-
Referee: The central empirical claim—that DeepSlide matches baselines on artifact quality but shows larger gains on delivery metrics—depends on the dual-scoreboard benchmark cleanly isolating static slide quality from dynamic aspects without overlap or bias. The manuscript provides no description of metric definitions, orthogonality tests between the two scoreboards, or controls for potential leakage (e.g., clearer artifact slides enabling better pacing and synergy by construction). This is load-bearing for interpreting the differential-gain result and must be addressed with explicit validation.
Authors: We agree that explicit validation of the separation is essential to support the differential-gain interpretation. In the revised manuscript we will expand the benchmark description (currently in Section 4) with (i) precise definitions and scoring procedures for every metric on both the artifact and delivery scoreboards, (ii) quantitative orthogonality analysis (Pearson and Spearman correlations across the 20 domains), and (iii) leakage-control experiments that evaluate delivery metrics on fixed baseline artifacts and artifact metrics on fixed scripts. These additions will directly address the concern about confounding and provide the requested explicit validation. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The manuscript presents an engineering system (multi-agent pipeline with planner, retriever, sequential rendering, and sandboxed execution) plus an introduced dual-scoreboard benchmark. No equations, fitted parameters, or derivations appear in the provided text. The central empirical claim rests on cross-domain comparisons rather than any quantity that reduces by construction to its own inputs or to a self-citation chain. The benchmark's separation of artifact versus delivery metrics is asserted but not shown to be tautological; it functions as an external evaluation protocol whose validity is open to independent verification. This is the normal case of a self-contained systems paper whose claims do not collapse into definitional equivalence.
Axiom & Free-Parameter Ledger
invented entities (1)
-
dual-scoreboard benchmark
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Alan Kelly.How Scientists Communicate: Dispatches from the Frontiers of KnowledgeDispatches from the Frontiers of Knowledge. 09 2020. ISBN 9780190936600. doi: 10.1093/oso/9780190936600.001.0001
-
[2]
Michael Alley. The craft of scientific presentations: Critical steps to succeed and critical errors to avoid.Physics Today, 57, 07 2004. doi: 10.1063/1.1784305
-
[3]
Autopresent: Designing structured visuals from scratch
Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, et al. Autopresent: Designing structured visuals from scratch. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2902–2911, 2025
work page 2025
-
[4]
Himanshu Maheshwari, Sambaran Bandyopadhyay, Aparna Garimella, and Anandhavelu Natarajan. Presentations are not always linear! gnn meets llm for document-to-presentation transformation with attribution.arXiv preprint arXiv:2405.13095, 2024
-
[5]
Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents. In Yvette Graham and Matthew Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for...
-
[6]
https://github.com/langchain-ai/langchain,
langchain-ai/langchain: The platform for reliable agents. https://github.com/langchain-ai/langchain,
-
[8]
Significant-gravitas/autogpt: Autogpt is the vision of accessible ai for everyone, to use and to build on. our mission is to provide the tools, so that you can focus on what matters.https://github.com/Significant-Gravitas/ AutoGPT, 2026. Accessed: 2026-02-26
work page 2026
-
[9]
https://github.com/microsoft/autogen,
microsoft/autogen: A programming framework for agentic ai. https://github.com/microsoft/autogen,
-
[10]
Accessed: 2026-02-26
work page 2026
-
[11]
Camel: Communicative agents for "mind" exploration of large language model society
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Communicative agents for "mind" exploration of large language model society. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[12]
MetaGPT: Meta programming for a multi-agent collaborative framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmid- huber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 2...
work page 2024
- [13]
- [14]
-
[15]
Google notebooklm.https://notebooklm.google/, 2026. Accessed: 2026-02-26
work page 2026
- [16]
-
[17]
Coze: Next-gen ai app developing platform.https://www.coze.com/, 2026. Accessed: 2026-02-26
work page 2026
-
[18]
PPTAgent: Generating and evaluating presentations beyond text-to-slides
Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. PPTAgent: Generating and evaluating presentations beyond text-to-slides. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Lang...
-
[19]
Mayer.Coherence Principle, page 113–133
Richard E. Mayer.Coherence Principle, page 113–133. Cambridge University Press, 2001
work page 2001
-
[20]
Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025
Tushar Aggarwal and Aarohi Bhand. Pass: Presentation automation for slide generation and speech.arXiv preprint arXiv:2501.06497, 2025. 15
-
[21]
Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, and Ying-Cong Chen. Pregenie: An agentic framework for high-quality visual presentation generation.arXiv preprint arXiv:2505.21660, 2025
-
[22]
Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024
Isabel Cachola, Silviu Cucerzan, Allen Herring, Vuksan Mijovic, Erik Oveson, and Sujay Kumar Jauhar. Knowledge-centric templatic views of documents.arXiv preprint arXiv:2401.06945, 2024
-
[23]
Yunqiao Yang, Wenbo Li, Houxing Ren, Zimu Lu, Ke Wang, Zhiyuan Huang, Zhuofan Zong, Mingjie Zhan, and Hongsheng Li. Slidesgen-bench: Evaluating slides generation via computational and quantitative metrics.arXiv preprint arXiv:2601.09487, 2026
-
[24]
Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025
Michael Ofengenden, Yunze Man, Ziqi Pang, and Yu-Xiong Wang. Pptarena: A benchmark for agentic powerpoint editing.arXiv preprint arXiv:2512.03042, 2025
-
[25]
Zheng Huang, Xukai Liu, Tianyu Hu, Kai Zhang, and Ye Liu. Pptbench: Towards holistic evaluation of large language models for powerpoint layout and design understanding.arXiv preprint arXiv:2512.02624, 2025
-
[26]
Qwen3.5: Towards native multimodal agents, February 2026
Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id= qwen3.5
work page 2026
-
[27]
Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, and Jingchen Shu. Indextts2: A breakthrough in emotionally expressive and duration-controlled auto-regressive zero-shot text-to-speech.arXiv preprint arXiv:2506.21619, 2025
-
[28]
Dayuanjiang/next-ai-draw-io: A next.js web application that integrates ai capabilities with draw.io diagrams. this app allows you to create, modify, and enhance diagrams through natural language commands and ai-assisted visualization.https://github.com/DayuanJiang/next-ai-draw-io, 2026. Accessed: 2026-02-27
work page 2026
-
[29]
Deqing Li, Honghui Mei, Yi Shen, Shuang Su, Wenli Zhang, Junting Wang, Ming Zu, and Wei Chen. Echarts: A declarative framework for rapid construction of web-based visualization.Visual Informatics, 2(2):136–146, 2018. ISSN 2468-502X. doi: https://doi.org/10.1016/j.visinf.2018.04.011. URL https://www.sciencedirect.com/ science/article/pii/S2468502X18300068
-
[30]
https://deepmind.google/models/gemini-image/,
Gemini image – nano banana — google deepmind. https://deepmind.google/models/gemini-image/,
-
[31]
Accessed: 2026-02-27
work page 2026
-
[32]
Doc2ppt: Automatic presentation slides generation from scientific documents
Tsu-Jui Fu, William Yang Wang, Daniel McDuff, and Yale Song. Doc2ppt: Automatic presentation slides generation from scientific documents. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 634–642, 2022
work page 2022
-
[33]
Keshav Kumar and Ravindranath Chowdary. Slidespawn: An automatic slides generation system for research publications.arXiv preprint arXiv:2411.17719, 2024
-
[34]
Yuheng Yang, Wenjia Jiang, Yang Wang, Yiwei Wang, and Chi Zhang. Auto-slides: An interactive multi-agent system for creating and customizing research presentations.arXiv preprint arXiv:2509.11062, 2025
-
[35]
PresentAgent: Multimodal agent for presentation video generation
Jingwei Shi, Zeyu Zhang, Biao Wu, Yanjie Liang, Meng Fang, Ling Chen, and Yang Zhao. PresentAgent: Multimodal agent for presentation video generation. In Ivan Habernal, Peter Schulam, and Jörg Tiedemann, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 760–773, Suzhou, China, Nove...
-
[36]
Alexander Meier, Mahei Manhai Li, and Roman Rietsche. Developing a hybrid vector-graph retrieval system for entity-preserving and inspiring storyline creation of presentation slides, 2025
work page 2025
- [37]
-
[38]
https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026
Microsoft 365 copilot | ai productivity tools for work. https://www.microsoft.com/en-us/ microsoft-365-copilot, 2026. Accessed: 2026-02-26
work page 2026
-
[39]
Google gemini.https://gemini.google.com/app, 2026. Accessed: 2026-02-26
work page 2026
-
[40]
https://www.beautiful.ai/, 2026
Best ai presentation maker for professional decks | beautiful.ai - generate high-quality slides with the artificial intelligence powered presentation tool available. https://www.beautiful.ai/, 2026. Accessed: 2026-02-26. 16
work page 2026
-
[41]
Rouge: A package for automatic evaluation of summaries
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004
work page 2004
-
[42]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002
work page 2002
-
[43]
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert.arXiv preprint arXiv:1904.09675, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[44]
Pptc benchmark: Evaluating large language models for powerpoint task completion
Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, and Nan Duan. Pptc benchmark: Evaluating large language models for powerpoint task completion. InFindings of the Association for Computational Linguistics: ACL 2024, pages 8682–8701, 2024
work page 2024
-
[45]
Ananth Muppidi, Tarak Das, Sambaran Bandyopadhyay, Tripti Shukla, et al. Taming llms with negative sam- ples: A reference-free framework to evaluate presentation content with actionable feedback.arXiv preprint arXiv:2505.18240, 2025. 17 Table of Contents for Appendices A System Details 19 B Evaluation 19 B.1 Baselines . . . . . . . . . . . . . . . . . . ....
-
[46]
Presentation Length (Duration)
-
[47]
Key Sections (Focus)
-
[48]
Style Preferences Interaction Style: ,→Be proactive. Read the paper content to guide the user. ,→Maintain context. Remember previous turns. ,→When the user provides new requirements, update your understanding. Completion Condition: When requirements are clear/confirmed, output the JSON strictly: { "audience": ""..."", "duration": ""..."", "focus_sections"...
-
[49]
Must include pipeline and put it at chosen[0]
-
[50]
Must include exactly one hook version (hook must be in chosen and must not be pipeline)
-
[51]
Pick 2 additional distinct templates from the given pool. Return STRICT JSON ONLY: { "chosen": [""pipeline"", ""<hook>"", ""<other1>"", ""<other2>""], "hook": ""<hook>"", "reasons": {"template_id": ""one-line reason"", ...} } Listing D.1-3: System Prompt ofLogic Chain Generator Agent You are a logic-chain generator for a research paper presentation. You m...
-
[52]
PRIORITIZE USER INTENT: If ’User Requirements Context’ or ’total_duration’ implies a specific structure, you MUST follow it
-
[53]
TEMPLATES ARE REFERENCE ONLY: If a narrative template conflicts with user intent or duration constraints, you MUST compress/merge template roles to satisfy constraints
-
[54]
You MUST output between {min_nodes} and {max_nodes} nodes
HARD CONSTRAINT: total_duration={duration_text}. You MUST output between {min_nodes} and {max_nodes} nodes
-
[55]
You may add extra roles: Hook, Takeaway, Extra
roles should follow the focus_sections or user instructions. You may add extra roles: Hook, Takeaway, Extra
-
[56]
text must be the concise title of the node (<= 10 words)
-
[57]
description must be a detailed summary (2-3 sentences) combining paper content and user intent
-
[58]
Edges: Create sequential edges ("type"=""sequential"") for the main flow. Do NOT create reference edges initially
-
[59]
LANGUAGE: Output node text and description in English. Listing D.1-4: System Prompt ofLogic Chain Edge Recommender You are a logic chain edge recommender assistant. Based on the given node list (ordered) and context abstract, recommend a set of directed edges and provide a short reason for each edge. Output STRICT JSON: { "edges": [ {"from": i, "to": j, "...
-
[60]
Return a JSON object where keys are Raw Section Names and values are the corresponding Logic Node Names
-
[61]
If a Raw Section implies or covers a Logic Node (even if wording differs), map it
-
[62]
If no match is found for a raw section, map it to null
-
[63]
One Logic Node can be matched by multiple Raw Sections (e.g. splitting a section). Return JSON map: {"Raw Name": ""Logic Name""} D.2 Visual & Layout Generation Listing D.2-1: System Prompt ofDeck Style Agent You are a deck style director. Output ONE JSON object ONLY matching this schema: { "version": ""v1"", "persona": str, "theme": ""light""|""dark"", "p...
-
[64]
Search content using tools
-
[65]
Use add_section(latex_cmd) for titles/structure. Prefer \section{...}
-
[66]
Ensure content is NOT a duplicate
Use add_slide(latex_body, speech_script) to create content slides. Ensure content is NOT a duplicate
-
[67]
Use add_citation if needed
-
[68]
Keep within time limit
-
[69]
Reply DONE when finished. Style Requirements (Clean, Concise, Modern Beamer): ,→Layout: Use standard itemize or enumerate environments. Keep slides un-cluttered. ,→Content: KEY POINTS ONLY. Use bullet points. Avoid long paragraphs or walls of text. ,→Titles: ALWAYS use \frametitle{...} for every slide. ,→Figures/Tables: If a slide contains a figure or tab...
-
[70]
ALWAYS locate the most relevant source nodes with search_relevant_nodes, then call get_node_content
-
[71]
If the source node contains figures/tables, you MUST call get_node_media and include at least one representative figure/table. 3.For slides containing a figure or table: Keep the slide minimal (title + at most 1-3 short bullets, or even no bullets). Put detailed explanation into speech_script
-
[72]
If both a figure and a table are relevant, create separate slides for them
-
[73]
If you cannot fit the figure/table into a slide, mention it in the ‘speech_script‘ explicitly and optionally cite it
-
[74]
DO NOT create a "big title" inside the slide by using ‘Large/huge‘ text; instead use ‘add_section(’section{{...}}’)‘. 28 Listing D.2-4: System Prompt ofRender Plan Agent You are a senior research keynote slide designer. Task: output ONE single JSON object called a RenderPlan for ONE slide. All layout and content decisions must be made by you; no determini...
-
[75]
,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles)
Detect structural or visual failures in the slide, focusing on: ,→Missing or unused assets (hero images, table viz, diagrams). ,→Mismatch between layout/effects and actual HTML (e.g., Image Focus effect but no ROI tiles). ,→Clearly wrong layout choices (e.g., diagram_layout used when a main figure is available). ,→Empty or nearly empty content regions (no...
-
[76]
Propose a SMALL patch to the RenderPlan (partial JSON) that would fix the most important problems. ,→Only touch layout / style_config / layout_config / effects_used / image.focus_template_id / diagram_spec. ,→Do NOT rewrite the actual content (title/core_message/bullets/steps) except when absolutely necessary
-
[77]
Provide optional notes_for_slide_agent that explains how future generations could avoid the same issue. IMPORTANT: ,→Output STRICT JSON only, matching the schema: { "issues": [ {"id": str, "severity": ""low""|""medium""|""high""|""critical"", "message": str, "hint": str, "location": object}, ... ], "suggested_plan_patch": object, "notes_for_slide_agent": ...
-
[78]
THINK: Analyze the error message and the context
-
[79]
NOTE: Log line numbers are often inaccurate for included files
OBSERVE: Use grep_files, read_file, or check_balance to locate the error. NOTE: Log line numbers are often inaccurate for included files
-
[80]
ACT: Fix the error using the appropriate tools
-
[81]
VERIFY: Call compile_pdf() to check if the error is resolved. If compile_pdf() returns "SUCCESS", reply "FIXED". If it fails, analyze the new error and repeat
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.