pith. sign in

arxiv: 2604.17894 · v1 · submitted 2026-04-20 · 💻 cs.CL

Automatic Slide Updating with User-Defined Dynamic Templates and Natural Language Instructions

Pith reviewed 2026-05-10 04:08 UTC · model grok-4.3

classification 💻 cs.CL
keywords dynamic slide updatenatural language instructionspresentation slidesagent frameworkmultimodal parsingDynaSlide benchmarktool-augmented reasoning
0
0 comments X

The pith

The paper defines a task for updating user-designed slides from natural language instructions and provides a benchmark and agent system to do it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors establish the problem of keeping diverse, user-created presentation slides current using only natural language commands on the original template. They release DynaSlide, a dataset of 20,036 instruction-slide pairs drawn from real business reporting decks that share an external data source. They introduce SlideAgent, which parses the slide visually and textually, grounds the instruction, and uses tools to update tables, charts, and conclusions while leaving layout and styling unchanged. This approach offers a workable baseline and highlights open difficulties in accurate data handling and style preservation. A reader would care because manual slide maintenance consumes significant time in data-driven work.

Core claim

Dynamic Slide Update via Natural Language Instructions on User-provided Templates is a new task that requires an agent to modify slide content according to instructions while strictly preserving the original layout and visual style. The DynaSlide benchmark supplies 20,036 triples of source slide, instruction, and target slide for training and evaluation. SlideAgent solves this by multimodal parsing of the slide, grounding the instruction in the slide elements, and employing tools for numerical and textual updates.

What carries the argument

SlideAgent, an agent-based system that merges multimodal slide parsing, natural language instruction grounding, and tool-augmented reasoning to perform updates on tables, charts, and text.

If this is right

  • Slides can be refreshed automatically from instructions without recreating layouts.
  • The shared database grounding allows consistent updates across related slides.
  • Evaluation protocols show specific failure modes in content accuracy and layout fidelity.
  • Future systems can build on this baseline for better performance on complex visuals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framework could extend to updating other visual documents like dashboards or infographics.
  • Connecting the agent directly to live databases might enable real-time slide maintenance.
  • Error patterns in the evaluations point to needs for better chart understanding modules.

Load-bearing premise

That existing multimodal and reasoning tools, when combined in an agent, can correctly interpret instructions and apply changes to arbitrary user slide designs without errors or style violations.

What would settle it

A test set of slides containing intricate charts and tables where SlideAgent produces outputs that deviate from the target slides in either data values or visual arrangement.

Figures

Figures reproduced from arXiv: 2604.17894 by Jiakai He, Kun Zhou, Weijia Jia, Wenmian Yang, Yiquan Zhang, Zhensheng Wang.

Figure 1
Figure 1. Figure 1: Illustration of the template filling. instantiated sub-template randomly selects one ti￾tle template to introduce lexical variation while preserving analytical intent. 3.2.2 Table template design Tables consist of a caption and a table body. In practical business presentation, the table caption typically specifies the analysis topic and data scope, while the table body aggregates and statistically summariz… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of SlideAgent. The framework operates in two stages: (1) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Element-level accuracy across closed-domain [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of three canonical table structures: Field-constraint, Constraint-field, and Cross-constraint. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The prompt used for multimodal layout parsing to predict semantic labels and bounding boxes. [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The prompt for data source extraction, mapping slide content to database schema slots. [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The prompt for closed-domain function logic extraction via predefined tool invocation. [PITH_FULL_IMAGE:figures/full_fig_p031_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The prompt for open-domain logic extraction, decomposing unknown logic into atomic parameters. [PITH_FULL_IMAGE:figures/full_fig_p032_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The prompt for user instruction parsing, modeled as a parameter state update task. [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The prompt for SQL generation, converting updated parameters into executable database queries. [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The prompt for fact-aware summary update based on data changes. [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗
read the original abstract

Presentation slides are a primary medium for data-driven reporting, yet keeping complex, analytics-style decks up to date remains labor-intensive. Existing automation methods mostly follow fixed template filling and cannot support dynamic updates for diverse, user-authored slide decks. We therefore define "Dynamic Slide Update via Natural Language Instructions on User-provided Templates" and introduce DynaSlide, a large-scale benchmark with 20,036 real-world instruction-execution triples (source slide, user instruction, target slide) grounded in a shared external database and built from business reporting slides under bring-your-own-template (BYO-template) conditions. To tackle this task, we propose SlideAgent, an agent-based framework that combines multimodal slide parsing, natural language instruction grounding, and tool-augmented reasoning for tables, charts, and textual conclusions. SlideAgent updates content while preserving layout and style, providing a strong reference baseline on DynaSlide. We further design end-to-end and component-level evaluation protocols that reveal key challenges and opportunities for future research. The dataset and code are available at https://github.com/XiaoZhou2024/SlideAgent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper defines the task of 'Dynamic Slide Update via Natural Language Instructions on User-provided Templates' and introduces DynaSlide, a benchmark of 20,036 real-world instruction-execution triples (source slide, instruction, target slide) constructed from business reporting slides under bring-your-own-template conditions and grounded in an external database. It proposes SlideAgent, an agent-based framework combining multimodal slide parsing, natural language instruction grounding, and tool-augmented reasoning for tables, charts, and text to update content while preserving layout and style, positions it as a strong baseline, and describes end-to-end and component-level evaluation protocols. The dataset and code are released.

Significance. If the framework's performance claims hold under rigorous validation, the work would be significant for practical automation of data-driven slide maintenance and for advancing multimodal agent research in document editing. The large-scale, real-world benchmark construction and public release of data/code are clear strengths that enable reproducibility and follow-on work.

major comments (2)
  1. [Evaluation protocols] Evaluation protocols section: the manuscript describes end-to-end and component-level evaluation protocols but supplies no quantitative results, error analysis, or explicit factual-accuracy audit against ground-truth target slides. This is load-bearing for the central claim that SlideAgent constitutes a 'strong reference baseline,' because any mismatch between parsed slide state and database state or errors in the agent's reasoning trace would directly falsify reliable update performance on diverse BYO-template decks.
  2. [SlideAgent framework] SlideAgent framework description: the claim that combining existing multimodal parsing, instruction grounding, and tool use reliably preserves layout/style without content errors or hallucinations rests on an untested assumption. No ablation studies or concrete evidence are provided to show that off-the-shelf components suffice for the diverse, user-authored slides in DynaSlide.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'providing a strong reference baseline' is stated without reference to any specific metrics or results; adding a brief indication of key findings would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important areas for strengthening the manuscript's claims about evaluation and the framework. We address each major comment below and commit to revisions that incorporate quantitative results, error analysis, and ablations.

read point-by-point responses
  1. Referee: [Evaluation protocols] Evaluation protocols section: the manuscript describes end-to-end and component-level evaluation protocols but supplies no quantitative results, error analysis, or explicit factual-accuracy audit against ground-truth target slides. This is load-bearing for the central claim that SlideAgent constitutes a 'strong reference baseline,' because any mismatch between parsed slide state and database state or errors in the agent's reasoning trace would directly falsify reliable update performance on diverse BYO-template decks.

    Authors: We agree that the absence of quantitative results, error analysis, and factual-accuracy audits weakens the support for claiming SlideAgent as a strong baseline. The current manuscript emphasizes the design of the protocols and the benchmark but does not present the actual performance numbers or audits. In the revised version, we will add the full quantitative results from our end-to-end and component-level evaluations (including accuracy metrics against ground-truth target slides), a detailed error analysis, and discussion of any parsing or reasoning mismatches. This will directly address the concern about potential falsification of performance claims. revision: yes

  2. Referee: [SlideAgent framework] SlideAgent framework description: the claim that combining existing multimodal parsing, instruction grounding, and tool use reliably preserves layout/style without content errors or hallucinations rests on an untested assumption. No ablation studies or concrete evidence are provided to show that off-the-shelf components suffice for the diverse, user-authored slides in DynaSlide.

    Authors: We acknowledge that the framework section relies on the integration of existing components without sufficient ablations or concrete evidence for the diverse BYO-template slides. While component-level evaluations are described, they do not include systematic ablations. In the revision, we will add ablation studies (e.g., removing individual modules like tool-augmented reasoning or multimodal parsing) and provide concrete examples and metrics demonstrating layout/style preservation, content accuracy, and handling of hallucinations on DynaSlide slides. This will replace the assumption with empirical support. revision: yes

Circularity Check

0 steps flagged

No circularity: task definition, external benchmark, and baseline framework are independent of self-referential inputs

full rationale

The paper defines the task of Dynamic Slide Update via Natural Language Instructions on User-provided Templates, constructs DynaSlide as 20,036 real-world triples (source slide, instruction, target slide) drawn from external business reporting slides under BYO-template conditions and grounded in a shared external database, and proposes SlideAgent as an agent framework that assembles existing multimodal parsing, instruction grounding, and tool-augmented reasoning components to serve as a reference baseline. Evaluation protocols are applied to this independently sourced benchmark. No equations, fitted parameters renamed as predictions, load-bearing self-citations, uniqueness theorems, or ansatzes appear in the derivation; the central claims rest on empirical results against external data rather than reducing to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The work rests on standard assumptions about the capabilities of current multimodal language models and tool-use agents rather than introducing new mathematical axioms or fitted parameters.

axioms (2)
  • domain assumption Multimodal models can accurately parse layout, text, tables, and charts from user-authored slides
    Invoked as the foundation for the parsing step in SlideAgent.
  • domain assumption Tool-augmented reasoning can correctly interpret natural language instructions and apply changes to slide content
    Core premise of the agent framework description.
invented entities (2)
  • DynaSlide benchmark no independent evidence
    purpose: Large-scale dataset of instruction-execution triples for the defined task
    Newly constructed and released in this paper; no independent evidence provided beyond the authors' collection process.
  • SlideAgent framework no independent evidence
    purpose: Agent system that performs dynamic slide updates
    Proposed architecture combining parsing, grounding, and tools; no external validation cited.

pith-pipeline@v0.9.0 · 5505 in / 1559 out tokens · 41552 ms · 2026-05-10T04:08:52.174700+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

127 extracted references · 127 canonical work pages

  1. [1]

    Cross-Analysis of New Housing Transaction Structure

    Universal multimodal representation for lan- guage understanding.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 45(7):9169– 9185. Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2025. Pptagent: Gen- erating and evaluating presentations beyond text-to- slides. InPr...

  2. [2]

    For example: Generate an analysis of {city} {block} from {start year} to {end year}.”

    Basic Replacement Instructions:Update only core variables (e.g., temporal spans, ge- ographic regions) while maintaining the sta- tistical functions and analytical dimensions. For example: Generate an analysis of {city} {block} from {start year} to {end year}.”

  3. [3]

    year" or

    Customized-Parameter Instructions:Fur- ther modify constraint parameters (e.g., area segmentation, price granularity), triggering cascading updates across data queries and sta- tistical computations. For example: Change the area segmentation to area range and price granularity to price range.” To support the diversity of these two scenarios, we construct ...

  4. [4]

    Block Area Segment Distri- bution

  5. [5]

    Analysis of Block Area Division

  6. [6]

    Distribution of Block Area Segments Supply- Transaction Unit Statistics

  7. [7]

    Start_Year-End_Year Supply and Transaction Unit Statistics inCity’sBlock

  8. [8]

    City Block : Supply & Sales V olume Analysis,Start_Year- End_Year

  9. [9]

    Sold in City’s Block(Start_Year-End_Year)

    Analysis of Property Units Supplied vs. Sold in City’s Block(Start_Year-End_Year)

  10. [10]

    From Start_Year to End_Year, Block’s core supply-demand area was Seg_Su pplyDemand_Core_Area m2, with the upgrade-oriented segment centered on Seg_SupplyDemand_Upgrade_Aream 2

  11. [11]

    Between Start_Year and End_Ye ar, the market structure in Block was defined by a core demand range of Seg_SupplyDemand_Core_Area m2 and an upgrade tier of Seg_SupplyDemand_Upgra de_Aream 2

  12. [12]

    Continued on next page

    The Block sector exhibited a dual- tier segmentation from Start_Year - End_Year : a primary volume cluster at Seg_SupplyDemand_Core_Area m2 and a secondary upgrade cluster at Seg_SupplyDemand_Upgrade_Aream 2. Continued on next page... Table 12 – continued from previous page Title Template Function Caption Template Summary Template

  13. [15]

    New Con- struction Inven- tory Structure Analysis Area × Price Cross Pivot

  14. [16]

    Start_Year -End_Year City Block Area and Total Price Cross Statistics

  15. [17]

    Cross-Analysis of Property Size and Price Points in City’s Block(Start_Year-End_Year)

  16. [18]

    City Block : Correlation between Unit Area and Total Price (Start_Year-End_Year)

  17. [20]

    Out of Total_Transaction_Units total transactions during Start_Year - End_Year, the peak velocity of Peak_S egment_Volume units occurred at the intersection of the Modal_Price_Segment price band and Modal_Area_Segment area band

  18. [21]

    The period Start_Year -End_Year saw Total_Transaction_Units total sales; the most active cross-segment was Modal_Price_Segment combined with Modal_Area_Segment , contributing Peak_Segment_Volumeunits

  19. [24]

    New Con- struction Inven- tory Structure Analysis Area Segment Distribution 1.Start_Year -End_Year City Block Total Area Segment Distribution Statistics

  20. [25]

    Distribution of Transactions by Property Size Segment in City’sBlock (Start_Year - End_Year)

  21. [26]

    City Block : Analysis of Market Share by Unit Area Brackets (Start_Year-End_Year)

  22. [27]

    Mainstream types concentrate in Dominant_Area_Segment segments, totaling Dominant_Area_Segment_Volume units

  23. [28]

    A volume of Dominant_Area_Segm ent_Volume units indicates that the Dominant_Area_Segment range represents the dominant area concentration

  24. [29]

    Continued on next page

    The Dominant_Area_Segment typology emerged as the mainstream segment, amassing a total of Dominant_Area_Segm ent_Volumeunits. Continued on next page... Table 12 – continued from previous page Title Template Function Caption Template Summary Template

  25. [30]

    New- House Cross- Structure Anal- ysis

  26. [31]

    New Resi- dential Portfo- lio Composi- tion

  27. [32]

    New Con- struction Inven- tory Structure Analysis Price Segment Distribution

  28. [33]

    Start_Year -End_Year City Block Total Price Segment Distribution Statistics

  29. [34]

    Distribution of Transactions by Price Point Segment in Ci ty’sBlock(Start_Year-End_Year)

  30. [35]

    Sales Breakdown by Price Range Categories for City’s Block,Start_Year-End_Year

  31. [36]

    Mainstream types concentrate in Dominant_Price_Segment segments, totaling Dominant_Price_Segment_Volume units

  32. [37]

    The Dominant_Price_Segment price bracket captured the ma- jority of interest, accumulating Dominant_Price_Segment_Volumeunits

  33. [38]

    With Dominant_Price_Segment_Volume units, the Dominant_Price_Segment segment constitutes the primary price concentration for the sector

  34. [41]

    Secondary Market Inven- tory Structure Study Area × Price Cross Pivot

  35. [42]

    Start_Year -End_Year City Block Resale House Area and Total Price Cross Statistics

  36. [43]

    Resale Market: Cross- Analysis of Property Size and Price in City’sBlock (Start_Year-End_Year)

  37. [44]

    Total Price in City’sBlock (Start_Year - End_Year)

    Statistical Profile of Resale Homes by Area vs. Total Price in City’sBlock (Start_Year - End_Year)

  38. [45]

    From Start_Year to End_Year, a total of Total_Transaction_Units units were transacted, with the Modal_Price_Segm ent price segment and Modal_Area_Seg ment area segment having the highest transactions at Peak_Segment_Volume units

  39. [46]

    Resale activity for Start_Year-End_ Year totaled Total_Transaction_Units units, peaked by Peak_Segment_Volume sales in the Modal_Price_Segment / Modal_Area_Segmentcross-segment

  40. [47]

    The Modal_Price_Segment and Moda l_Area_Segment cohorts led the resale market with Peak_Segment_Volume units, driving a cumulative volume of Total_Transaction_Units

  41. [50]

    Secondary Market Inven- tory Structure Study Area Segment Distribution

  42. [51]

    Start_Year -End_Year City Block Resale House Total Area Segment Distribution Statistics

  43. [52]

    Resale Market Transaction Distribution by Property Size in City’sBlock (Start_Year - End_Year)

  44. [53]

    Breakdown of Existing Home Sales by Size Category in City’sBlock , Start_Year - End_Year

  45. [54]

    Mainstream types concentrate in the Dominant_Area_Segment segments, totaling Dominant_Area_Segment_Volume units

  46. [55]

    The resale inventory is heavily weighted in the Dominant_Area_Segment range, which accounts for Dominant_Are a_Segment_Volumeunits

  47. [56]

    Continued on next page

    Accounting for Dominant_Area_Se gment_Volume units, the Dominant_Are a_Segment category stands out as the primary resale typology. Continued on next page... Table 12 – continued from previous page Title Template Function Caption Template Summary Template

  48. [57]

    Resale- House Cross- Structure Anal- ysis

  49. [58]

    Resale Resi- dential Portfo- lio Assessment

  50. [59]

    Secondary Market Inven- tory Structure Study Price Segment Distribution

  51. [60]

    Start_Year -End_Year City Block Resale House Total Price Segment Distribution Statistics

  52. [61]

    Resale Market Transaction Distribution by Price Point in City’sBlock (Start_Year - End_Year)

  53. [62]

    Breakdown of Existing Home Sales by Price Range in City’sBlock , Start_Year - End_Year

  54. [63]

    Mainstream types concentrate in the Dominant_Price_Segment segments, totaling Dominant_Price_Segment_Volume units

  55. [64]

    The Dominant_Price_Segment price tier represents the core resale market, comprising Dominant_Price_Segment_Vol umeunits

  56. [65]

    A total of Dominant_Price_Segment_V olume resale units clustered within the Dominant_Price_Segment_Volume price band

  57. [68]

    Emerging Residential Market Scale Evaluation Historical Capacity Summary

  58. [69]

    City Block Historical Ca- pacity Summary Statistics (Start_Year-End_Year)

  59. [70]

    Historical Market V olume Summary for City’sBlock (Start_Year-End_Year)

  60. [71]

    Summary of Past Market Scale Statistics for City’sBlock, Start_Year-End_Year

  61. [72]

    From Start_Year to End_Year, Block’s traded area Area_Trend_Direction Total_Area_Change_Pct % from Base_P eriod_Traded_Area m2 to Terminal_Per iod_Traded_Area m2, and the average valuation also Price_Trend_Direction Total_Price_Change_Pct% from Base_Per iod_Avg_Price to Terminal_Period_Avg_ Priceyuan/m 2

  62. [73]

    Over the Start_Year-End_Year period, the sector saw traded area Area_Trend _Direction by Total_Area_Change_Pct % (moving from Base_Period_Traded_Area to Terminal_Period_Traded_Area m2), while valuations Price_Trend_Direction byTotal_Price_Change_Pctyuan/m 2

  63. [74]

    Continued on next page

    Starting at Base_Period_Traded_Area m2 and Base_Period_Avg_Price yuan/m2, the market Area_Trend_Direction to Terminal_Period_Traded_Area m2 and Terminal_Period_Avg_Price yuan/m2 respectively, marking a volume shift of Total_Area_Change_Pct % and a price shift of Absolute_Price_Change yuan/m2. Continued on next page... Table 12 – continued from previous pa...

  64. [77]

    Emerging Residential Market Scale Evaluation Annual Supply- Demand Comparison

  65. [78]

    City Block Annual Supply- Demand Comparison Analysis (Start_Year-End_Year)

  66. [79]

    City Block : Annual Com- parison of Market Supply and Transaction V olume (Start_Year-End_Year)

  67. [80]

    Analysis of Annual Supply- Demand Balance in City’s Block(Start_Year-End_Year)

  68. [81]

    From Start_Year to End_Year , new listings in this sector Supply_Trend_Dir ection from Base_Period_Supply_Units units to Terminal_Period_Supply_Units units (a Supply_Trend_Direction of Tota l_Supply_Change_Pct%), and transaction volume Transaction_Trend_Direction from Base_Period_Transaction_Units units to Terminal_Period_Transaction_Un its units (a Trans...

  69. [82]

    While listings Supply_Trend_Direc tion by Total_Supply_Change_Pct % (reaching Terminal_Period_Supply_Units units), transactions simultaneously Transaction_Trend_Direction by Total_ Transaction_Change_Pct% (ending at Te rminal_Period_Transaction_Units units) betweenStart_YearandEnd_Year

  70. [83]

    Continued on next page

    Comparing Start_Year to End_Ye ar, supply Supply_Trend_Direction to Terminal_Period_Supply_Units (Total_Supply_Change_Pct % Supp ly_Trend_Direction ), and demand Transaction_Trend_Direction to Te rminal_Period_Transaction_Units (Total_Transaction_Change_Pct % Transaction_Trend_Direction). Continued on next page... Table 12 – continued from previous page T...

  71. [84]

    New-House Market Capac- ity Analysis

  72. [85]

    New Con- struction V ol- ume & Supply Capacity

  73. [86]

    Emerging Residential Market Scale Evaluation Supply- Transaction Area

  74. [87]

    City Block : Historical Sup- ply and Transaction Area Statistics (Start_Year-End_Year)

  75. [88]

    Statistical Review of His- torical Supply and Transac- tion Area for City’sBlock (Start_Year-End_Year)

  76. [89]

    Sold Area in City’sBlock (Start_Year-End_Year)

    Historical Data: Supplied vs. Sold Area in City’sBlock (Start_Year-End_Year)

  77. [90]

    From Start_Year to End_Year , in- ventory in this region Supply_Trend _Direction by Total_Supply_Chang e_Pct %, while the transaction area Transaction_Trend_Direction by Total_Transaction_Change_Pct%

  78. [91]

    Area-wise inventory Supply_Trend_D irection at a Total_Supply_Change_Pct% rate, contrasting with the transaction area which Transaction_Trend_Direct ion by Total_Transaction_Change_Pct % throughEnd_Year

  79. [92]

    The region experienced a Total_Supp ly_Change_Pct% Supply_Trend_Direction in supply area and a Total_Transactio n_Change_Pct% Transaction_Trend_Dire ction in sold area between Start_Year andEnd_Year

  80. [95]

    Secondary Market Stock & Unit Compo- sition Historical Delivery Metrics

Showing first 80 references.