pith. sign in

arxiv: 2606.02915 · v1 · pith:54BAW5B4new · submitted 2026-06-01 · 💻 cs.CV

Any2Poster: Any-Source Poster Generation Across Modalities and Domains

Pith reviewed 2026-06-28 14:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords poster generationmultimodal inputsbenchmark evaluationAI agentlayout planningvisual feedbackcontent condensationcross-domain generation
0
0 comments X

The pith

A single agent generates posters from any input source across modalities and domains using parsing, planning, and visual feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Any2Poster Bench, a new evaluation resource that tests poster generation from eight input types including PDFs, videos, notebooks, and documents, across five content domains. It pairs each source with quiz questions that check factual retention and interpretive understanding, plus judgments of layout and visual quality. The authors also present Any2Poster Agent, an end-to-end system that reads the source, selects key content, designs the layout, renders the poster, and improves it through iterative visual checks. This combination moves evaluation beyond earlier paper-only or narrow-domain tests and supplies both a reusable benchmark and a working baseline system.

Core claim

Any2Poster Bench enables reproducible measurement of poster generation from heterogeneous sources by combining quiz probes for information fidelity with VLM assessments of visual communication; Any2Poster Agent demonstrates the feasibility of this approach by automatically handling parsing of diverse inputs, content organization, layout planning, rendering, and refinement, achieving consistent performance across modalities and domains.

What carries the argument

Any2Poster Agent, an end-to-end pipeline that parses heterogeneous sources, organizes salient content, plans layouts, renders posters, and refines them using visual feedback.

If this is right

  • Poster generation systems can be evaluated on non-text inputs such as videos and notebooks rather than papers alone.
  • Performance can be tracked separately for factual accuracy and for layout or readability quality.
  • Iterative visual feedback loops become a standard component in multimodal content creation pipelines.
  • Domain-general agents become testable baselines for tasks that require condensing dense information into visual form.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent structure could extend to generating other condensed visual formats such as infographics or slide decks from mixed sources.
  • Quiz-based evaluation could transfer to measuring retention from other AI-generated summaries or diagrams.
  • If the refinement step scales, it suggests a general pattern for agents that improve outputs through self-critique on visual tasks.

Load-bearing premise

VLM judgments of visual quality together with quiz probes provide a valid measure of information fidelity and visual communication quality.

What would settle it

A controlled human study in which viewers answer the benchmark quiz questions after seeing only the generated poster, compared against answers from the original source.

Figures

Figures reproduced from arXiv: 2606.02915 by Aiden Li, Amogh Vinaykumar, Shilong Liu, Suozhi Huang.

Figure 1
Figure 1. Figure 1: Positioning of Any2Poster Bench and representative generation systems along input [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Any2Poster Bench. The benchmark covers eight input modalities and five content domains, and evaluates generated posters through BenchQuiz, a genre-aware multiple-choice reading-comprehension protocol answered by VLM readers. evaluating both quiz-based information recovery and VLM-as-judge visual communication quality across multiple modalities and content domains. 3 Any2Poster Bench We introduc… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of Any2Poster Agent. The pipeline has three main components: (1) Unified Parsing, which converts heterogeneous inputs into a shared structured representation; (2) Content￾adaptive Poster Planning, which analyzes source content, assigns panel roles and layout modes, and decides whether to preserve or generate visuals; and (3) HTML/CSS-based Rendering, which compiles an editable poster and applies V… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative examples comparing generated posters from different systems on the same [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
read the original abstract

Visual posters are a compact medium for communicating dense information, yet progress on automatic poster generation remains difficult to measure because existing evaluations are often restricted to paper-only inputs, narrow domains, or surface-level visual similarity. We introduce Any2Poster Bench, a benchmark for any-source poster generation that evaluates systems across eight input modalities--PDFs, URLs, PPTX, DOCX, Markdown, LaTeX, notebooks, and videos--and five content domains. Any2Poster Bench pairs each source with quiz-based probes of verbatim factual retention and interpretive understanding, together with VLM-based judgments of visual quality, layout, readability, content completeness, and logical flow, enabling reproducible assessment of both information fidelity and visual communication. To instantiate and validate this benchmark, we further present Any2Poster Agent, an end-to-end reference agent that parses heterogeneous sources, organizes salient content, plans poster layouts, renders posters, and iteratively refines them using visual feedback. On Any2Poster Bench, Any2Poster Agent achieves 87.25% average accuracy across input modalities and 87.28% across content domains. On PaperQuiz-style evaluation, where prior paper-to-poster agents are directly comparable, Any2Poster Agent improves over PosterAgent-4o from 51.06-51.33% to 72.58% overall accuracy and from 116-121 to 145.16 in density-augmented score. Together, Any2Poster Bench and Any2Poster Agent provide a reusable evaluation resource and a competitive baseline for studying multimodal, domain-general poster generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces Any2Poster Bench, a benchmark for any-source poster generation across eight input modalities (PDFs, URLs, PPTX, DOCX, Markdown, LaTeX, notebooks, videos) and five content domains. The benchmark uses quiz-based probes for factual retention and interpretive understanding alongside VLM-based judgments of visual quality, layout, readability, content completeness, and logical flow. It also presents Any2Poster Agent, an end-to-end reference agent for parsing sources, organizing content, planning layouts, rendering, and iterative refinement, reporting 87.25% average accuracy across modalities, 87.28% across domains, and improvements over PosterAgent-4o (51.06-51.33% to 72.58% overall accuracy; 116-121 to 145.16 density-augmented score) on PaperQuiz-style evaluation.

Significance. If the VLM judgments prove reliable, the benchmark and agent would supply a standardized, reproducible resource for multimodal poster generation that addresses gaps in prior paper-only or narrow-domain evaluations, with the reported gains providing a competitive baseline.

major comments (1)
  1. [§3–4] §3–4 (benchmark construction, as referenced in abstract): The central performance claims (87.25% modality accuracy, 72.58% PaperQuiz accuracy) rest on VLM-based judgments of visual quality/layout/readability/completeness/flow and quiz probes, yet no human correlation, inter-annotator agreement, or ablation linking VLM scores to viewer comprehension/preference is reported. This is load-bearing because the skeptic concern directly questions whether these automated signals validly measure poster effectiveness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on evaluation validity. We address the concern point-by-point below.

read point-by-point responses
  1. Referee: [§3–4] §3–4 (benchmark construction, as referenced in abstract): The central performance claims (87.25% modality accuracy, 72.58% PaperQuiz accuracy) rest on VLM-based judgments of visual quality/layout/readability/completeness/flow and quiz probes, yet no human correlation, inter-annotator agreement, or ablation linking VLM scores to viewer comprehension/preference is reported. This is load-bearing because the skeptic concern directly questions whether these automated signals validly measure poster effectiveness.

    Authors: The 87.25% modality and 72.58% PaperQuiz accuracy figures are computed exclusively from the quiz probes, which are manually authored factual and interpretive questions; these provide an objective, human-grounded signal of information retention that does not depend on VLM output. VLM judgments are used only for the separate visual-quality, layout, readability, completeness, and flow dimensions. We acknowledge that the manuscript does not report a new human correlation study or inter-annotator agreement for the VLM component. In revision we will add a dedicated paragraph in §4 discussing VLM reliability, citing prior work on VLM-human alignment for layout and visual tasks, and we will include a small-scale human preference correlation on a held-out subset of posters. revision: yes

Circularity Check

0 steps flagged

No circularity; benchmark and performance claims are self-contained without reduction to fitted inputs or self-citations.

full rationale

The paper introduces Any2Poster Bench (quiz probes + VLM judgments) and Any2Poster Agent as a reference system, then reports accuracy numbers on that benchmark. No equations, fitted parameters, or derivation steps appear in the provided text. The central claims rest on external evaluation signals (quizzes and VLM rubrics) rather than any quantity that reduces by construction to the agent's own outputs or prior self-citations. Self-citation load-bearing, ansatz smuggling, and renaming patterns are absent. The evaluation methodology is stated explicitly and does not collapse into tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Only the abstract is available, so the ledger records the evaluation assumptions stated there. The central performance claims rest on the validity of the introduced benchmark and its metrics rather than on any derivation from prior equations.

axioms (2)
  • domain assumption VLM-based judgments reliably measure visual quality, layout, readability, content completeness, and logical flow
    The benchmark construction in the abstract relies on these judgments for reproducible assessment.
  • domain assumption Quiz-based probes accurately capture verbatim factual retention and interpretive understanding
    The abstract states that each source is paired with such probes to evaluate information fidelity.
invented entities (2)
  • Any2Poster Bench no independent evidence
    purpose: Benchmark for any-source poster generation across modalities and domains
    New evaluation resource introduced in the abstract.
  • Any2Poster Agent no independent evidence
    purpose: End-to-end reference agent for parsing sources, planning layouts, rendering, and refining posters
    New baseline system presented in the abstract.

pith-pipeline@v0.9.1-grok · 5821 in / 1572 out tokens · 32536 ms · 2026-06-28T14:52:40.593902+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 14 canonical work pages · 4 internal anchors

  1. [1]

    Croissant: A metadata format for ml-ready datasets, 2024

    Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Luca Foschini, Joan Giner-Miguelez, et al. Croissant: A metadata format for ml-ready datasets, 2024

  2. [2]

    Lawrence Zitnick, and Devi Parikh

    Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. Vqa: Visual question answering. InProceedings of the IEEE International Conference on Computer Vision, 2015

  3. [3]

    Qwen2.5-VL technical report, 2025

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. Qwen2.5-VL technical report, 2025

  4. [4]

    Enhancing presentation slide generation by llms with a multi-staged end-to-end approach

    Sambaran Bandyopadhyay, Himanshu Maheshwari, Anandhavelu Natarajan, and Apoorv Sax- ena. Enhancing presentation slide generation by llms with a multi-staged end-to-end approach. InProceedings of the 17th International Natural Language Generation Conference, pages 222–229. Association for Computational Linguistics, 2024

  5. [5]

    Nougat: Neural Optical Understanding for Academic Documents

    Lukas Blecher, Guillem Cucurull, Thomas Scialom, and Robert Stojnic. Nougat: Neural optical understanding for academic documents.arXiv preprint arXiv:2308.13418, 2023

  6. [6]

    POSTA: A go-to framework for customized artistic poster generation

    Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, and Xinchao Wang. POSTA: A go-to framework for customized artistic poster generation. arXiv preprint arXiv:2503.14908, 2025

  7. [7]

    Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities, 2025

    Gheorghe Comanici et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities, 2025

  8. [8]

    PAL: Program-aided language models

    Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. PAL: Program-aided language models. InProceedings of the 40th International Conference on Machine Learning, 2023

  9. [9]

    Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021

  10. [10]

    Paper2Slides: From paper to presentation in one click

    HKUDS. Paper2Slides: From paper to presentation in one click. https://github.com/ HKUDS/Paper2Slides, 2025. Software project

  11. [11]

    Hudson and Christopher D

    Drew A. Hudson and Christopher D. Manning. Gqa: A new dataset for real-world visual reasoning and compositional question answering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6700–6709, 2019

  12. [12]

    Text2Poster: Laying out stylized texts on retrieved images

    Chuhao Jin, Hongteng Xu, Ruihua Song, and Zhiwu Lu. Text2Poster: Laying out stylized texts on retrieved images. InIEEE International Conference on Acoustics, Speech and Signal Processing, pages 4823–4827. IEEE, 2022

  13. [13]

    WebDraw: A machine learning-driven tool for automatic website prototyping.Science of Computer Programming, 233:103056, 2024

    Thisaranie Kaluarachchi and Manjusri Wickramasinghe. WebDraw: A machine learning-driven tool for automatic website prototyping.Science of Computer Programming, 233:103056, 2024

  14. [14]

    Relation-aware diffusion model for controllable poster layout generation.arXiv preprint arXiv:2306.09086, 2023

    Fengheng Li, An Liu, Wei Feng, Honghe Zhu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, and Jingping Shao. Relation-aware diffusion model for controllable poster layout generation.arXiv preprint arXiv:2306.09086, 2023

  15. [15]

    Planning and rendering: Towards product poster generation with diffusion models.arXiv preprint arXiv:2312.08822, 2024

    Zhaochen Li, Fengheng Li, Wei Feng, Honghe Zhu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, and Zhenglu Yang. Planning and rendering: Towards product poster generation with diffusion models.arXiv preprint arXiv:2312.08822, 2024

  16. [16]

    Holistic evaluation of language models, 2022

    Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models, 2022

  17. [17]

    LayoutPrompter: Awaken the design ability of large language models.arXiv preprint arXiv:2311.06495, 2023

    Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang James Yang, Jian-Guang Lou, and Dongmei Zhang. LayoutPrompter: Awaken the design ability of large language models.arXiv preprint arXiv:2311.06495, 2023. 10

  18. [18]

    G-Eval: Nlg evaluation using gpt-4 with better human alignment

    Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-Eval: Nlg evaluation using gpt-4 with better human alignment. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, 2023

  19. [19]

    Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Kasper Dinkla, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, and Peter W. J. Staar. Docling: An efficient open-source toolkit for ai-driven document convers...

  20. [20]

    OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

    Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, and James Zou. Octo- Tools: An agentic framework with extensible tools for complex reasoning.arXiv preprint arXiv:2502.11271, 2025

  21. [21]

    Ui layout generation with llms guided by ui grammar.arXiv preprint arXiv:2310.15455, 2023

    Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, and Toby Jia-Jun Li. Ui layout generation with llms guided by ui grammar.arXiv preprint arXiv:2310.15455, 2023

  22. [22]

    GlyphDraw2: Automatic generation of complex glyph posters with diffusion models and large language models.arXiv preprint arXiv:2407.02252, 2024

    Jian Ma, Yonglin Deng, Chen Chen, Nanyang Du, Haonan Lu, and Zhenyu Yang. GlyphDraw2: Automatic generation of complex glyph posters with diffusion models and large language models.arXiv preprint arXiv:2407.02252, 2024

  23. [23]

    Presentations by the humans and for the humans: Harnessing llms for generating persona-aware slides from documents

    Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopad- hyay, and Jordan Boyd-Graber. Presentations by the humans and for the humans: Harnessing llms for generating persona-aware slides from documents. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, pages 2664–26...

  24. [24]

    GPT-4o system card

    OpenAI. GPT-4o system card. https://openai.com/index/gpt-4o-system-card/ , 2024

  25. [25]

    GPT-5 system card.https://openai.com/index/gpt-5-system-card/, 2025

    OpenAI. GPT-5 system card.https://openai.com/index/gpt-5-system-card/, 2025

  26. [26]

    Paper2Poster: Towards multimodal poster automation from scientific papers, 2025

    Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, and Philip Torr. Paper2Poster: Towards multimodal poster automation from scientific papers, 2025

  27. [27]

    ART: Automatic multi-step reasoning and tool-use for large language models

    Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. ART: Automatic multi-step reasoning and tool-use for large language models.arXiv preprint arXiv:2303.09014, 2023

  28. [28]

    marker: Convert pdf to markdown and json quickly with high accuracy

    Vik Paruchuri. marker: Convert pdf to markdown and json quickly with high accuracy. https: //github.com/VikParuchuri/marker, 2025. Software project

  29. [29]

    Nassar, and Peter W

    Birgit Pfitzmann, Christoph Auer, Michele Dolfi, Ahmed S. Nassar, and Peter W. J. Staar. DocLayNet: A large human-annotated dataset for document-layout analysis. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3743–3751, 2022

  30. [30]

    Data cards: Purposeful and trans- parent dataset documentation for responsible ai

    Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. Data cards: Purposeful and trans- parent dataset documentation for responsible ai. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1776–1826, 2022

  31. [31]

    Learning to generate posters of scientific papers by probabilistic graphical models, 2017

    Yu-ting Qiang, Yanwei Fu, Xiao Yu, Yanwen Guo, Zhi-Hua Zhou, and Leonid Sigal. Learning to generate posters of scientific papers by probabilistic graphical models, 2017

  32. [32]

    Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte Suresh, François Savard, Ahmed Masry, Shravan Nayak, et al

    Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte Suresh, François Savard, Ahmed Masry, Shravan Nayak, et al. BigDocs: An open dataset for training multimodal models on document and code tasks. InThe Thirteenth International Conference on Learning Representations, 2025

  33. [33]

    PosterSum: A multimodal benchmark for scientific poster summarization.arXiv preprint arXiv:2502.17540, 2025

    Rohit Saxena, Pasquale Minervini, and Frank Keller. PosterSum: A multimodal benchmark for scientific poster summarization.arXiv preprint arXiv:2502.17540, 2025

  34. [34]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.arXiv preprint arXiv:2302.04761, 2023. 11

  35. [35]

    SlideGen: An abstractive section-based slide generator for scholarly documents

    Athar Sefid, Prasenjit Mitra, and Lee Giles. SlideGen: An abstractive section-based slide generator for scholarly documents. InProceedings of the 21st ACM Symposium on Document Engineering. Association for Computing Machinery, 2021

  36. [36]

    De- sign2Code: Benchmarking multimodal code generation for automated front-end engineering

    Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, and Diyi Yang. De- sign2Code: Benchmarking multimodal code generation for automated front-end engineering. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3956–3974. Association fo...

  37. [37]

    Edward Sun, Yufang Hou, Dakuo Wang, Yunfeng Zhang, and Nancy X. R. Wang. D2S: Document-to-slide generation via query-based text summarization. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1405–1418. Association for Computational Linguistics, 2021

  38. [38]

    P2P: Automated paper-to-poster generation and fine-grained benchmark, 2025

    Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, and Zhoujun Li. P2P: Automated paper-to-poster generation and fine-grained benchmark, 2025

  39. [39]

    PosterBot: A system for generating posters of scientific papers with neural models

    Sheng Xu and Xiaojun Wan. PosterBot: A system for generating posters of scientific papers with neural models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 13233–13235, 2022

  40. [40]

    Poster- LLaV A: Constructing a unified multi-modal layout generator with llm.arXiv preprint arXiv:2406.02884, 2024

    Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, and Chang Wen Chen. Poster- LLaV A: Constructing a unified multi-modal layout generator with llm.arXiv preprint arXiv:2406.02884, 2024

  41. [41]

    Narasimhan, and Yuan Cao

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, 2023

  42. [42]

    PosterGen: Aesthetic-aware paper-to-poster generation via multi-agent llms, 2025

    Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, and Chenyu You. PosterGen: Aesthetic-aware paper-to-poster generation via multi-agent llms, 2025

  43. [43]

    PPTAgent: Generating and evaluating presentations beyond text-to-slides.arXiv preprint arXiv:2501.03936, 2025

    Hao Zheng, Xinyan Guan, Hao Kong, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, and Le Sun. PPTAgent: Generating and evaluating presentations beyond text-to-slides.arXiv preprint arXiv:2501.03936, 2025

  44. [44]

    Xing, Hao Zhang, Joseph E

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023

  45. [45]

    issues": [

    Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. PubLayNet: Largest dataset ever for document layout analysis. In2019 International Conference on Document Analysis and Recognition, pages 1015–1022, 2019. 12 Appendix Contents ADataset Details .....................................................................................................................