pith. sign in

arxiv: 2511.22490 · v2 · submitted 2025-11-27 · 💻 cs.CV · cs.IR

SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts

Pith reviewed 2026-05-17 04:42 UTC · model grok-4.3

classification 💻 cs.CV cs.IR
keywords poster layout generationscientific papersretrieval-augmented generationlayout datasetvisual communicationcomputer visionpaper structure analysis
0
0 comments X

The pith

SciPostGen dataset shows paper structures are tied to the number of elements in their posters and supports retrieval-augmented layout generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SciPostGen, a large-scale dataset that pairs scientific papers with annotated poster layouts to study their correspondence. Analyses from the dataset establish that paper structures correlate with the count and arrangement of layout elements such as figures, text blocks, and sections. Building on this, the authors develop a Retrieval-Augmented Poster Layout Generation framework that pulls matching layouts from similar past papers and uses them to guide creation of new layouts. Experiments demonstrate the approach works both with and without typical poster constraints like size or section rules. This addresses a practical need for better ways to turn dense papers into clear visual presentations at conferences.

Core claim

Paper structures are associated with the number of layout elements in posters. The SciPostGen dataset provides paired annotations at scale to examine this link. A retrieval-augmented framework retrieves layouts consistent with a given paper's structure and employs them as guidance to generate new layouts that satisfy additional constraints specified by poster creators.

What carries the argument

Retrieval-Augmented Poster Layout Generation framework, which retrieves past layouts aligned with a paper's structure and uses them to guide generation of new constraint-aware layouts.

If this is right

  • Paper structure can be used to estimate the appropriate number and type of elements for a poster.
  • Retrieval from a database of past posters improves alignment between paper content and generated layout.
  • The framework produces usable layouts under both constrained and unconstrained conditions.
  • Public release of the dataset enables further study of paper-to-poster mappings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Automated poster tools could become standard aids for researchers preparing conference submissions.
  • The same retrieval idea might extend to generating slides or figure arrangements from paper text.
  • Large-scale paired datasets like this could reveal broader patterns in how scientists choose to visualize their work.

Load-bearing premise

Layouts retrieved from past papers can reliably guide generation for new papers without introducing uncorrectable style mismatches or content omissions.

What would settle it

A test set of papers with novel structures where the generated layouts consistently violate given constraints or show element counts far from those predicted by retrieved similar papers.

Figures

Figures reproduced from arXiv: 2511.22490 by Atsushi Hashimoto, Koichiro Yoshino, Shohei Tanaka, Shun Inadumi, Tosho Hirasawa, Yoshitaka Ushiku.

Figure 1
Figure 1. Figure 1: Overview of Retrieval-Augmented Poster Layout Gen [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Annotation pipeline of SciPostGen: Paper and poster [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Correlation analyses between paper structures and poster layouts in SciPostGen test split: Higher absolute Spearman’s coefficients [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the Retrieval-Augmented Poster Layout Generation framework under the automatic setting. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of retrieved results under the (a) [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of generation results under the (a) [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of annotations in SciPostGen, including automatically extracted paper and poster annotations and manually corrected [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Correlation analyses between paper structures and poster layouts in SciPostGen train split [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detailed architecture of the paper encoder [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 1
Figure 1. Figure 1: 1.92, Figure 2: 4.86 [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 11
Figure 11. Figure 11: Ablation results for the retriever: (a) Effect of changing [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Failure cases of the retriever: we show the retrieved [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
read the original abstract

As the number of scientific papers continues to grow, there is a demand for approaches that can effectively convey research findings, with posters serving as a key medium for presenting paper contents. Poster layouts determine how effectively research is communicated and understood, highlighting their growing importance. In particular, a gap remains in understanding how papers correspond to the layouts that present them, which calls for datasets with paired annotations at scale. To bridge this gap, we introduce SciPostGen, a large-scale dataset for understanding and generating poster layouts from scientific papers. Our analyses based on SciPostGen show that paper structures are associated with the number of layout elements in posters. Based on this insight, we explore a framework, Retrieval-Augmented Poster Layout Generation, which retrieves layouts consistent with a given paper and uses them as guidance for layout generation. We conducted experiments under two conditions: with and without layout constraints typically specified by poster creators. The results show that the retriever estimates layouts aligned with paper structures, and our framework generates layouts that also satisfy given constraints. The dataset and code are publicly available at https://omron-sinicx.github.io/paper2layout/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SciPostGen, a large-scale dataset pairing scientific papers with annotated poster layouts. Analyses on the dataset establish associations between paper structures and the number of layout elements in posters. The authors propose a Retrieval-Augmented Poster Layout Generation framework that retrieves past layouts consistent with a new paper and uses them to guide generation; experiments in constrained and unconstrained settings report that the retriever produces layouts aligned with paper structures and that the generated layouts satisfy given constraints. The dataset and code are released publicly.

Significance. If the central claims hold, the work supplies a practical resource for automating poster layout creation from papers, a task relevant to scientific communication. The public dataset and code release is a clear strength that supports reproducibility and follow-on research. The retrieval-augmented approach offers a concrete way to leverage historical structure-layout correlations, though its value hinges on the fidelity of the retrieval step.

major comments (2)
  1. Abstract and retrieval framework description: the claim that 'the retriever estimates layouts aligned with paper structures' provides no explicit definition or isolated metric for structural alignment (e.g., section hierarchy, figure-to-text ratio, or element ordering) separate from generic embedding similarity. This leaves open whether reported alignment reflects logical structure or surface-level topic/style features, directly affecting the reliability of the downstream generation step and the claimed association between paper structures and layout element counts.
  2. Experiments section: the reported results under constrained and unconstrained settings assert alignment and constraint satisfaction, yet the abstract and available description do not include quantitative tables, baseline comparisons, or error-bar details that would allow verification of improvement margins or robustness.
minor comments (1)
  1. The abstract would benefit from a brief statement of dataset scale (number of paper-poster pairs) and annotation methodology to give readers an immediate sense of scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the clarity and rigor of the presentation.

read point-by-point responses
  1. Referee: [—] Abstract and retrieval framework description: the claim that 'the retriever estimates layouts aligned with paper structures' provides no explicit definition or isolated metric for structural alignment (e.g., section hierarchy, figure-to-text ratio, or element ordering) separate from generic embedding similarity. This leaves open whether reported alignment reflects logical structure or surface-level topic/style features, directly affecting the reliability of the downstream generation step and the claimed association between paper structures and layout element counts.

    Authors: We agree that an explicit definition and isolated metric would strengthen the claim. Our dataset analyses demonstrate a clear statistical association between paper section structures (e.g., counts of sections such as Introduction, Methods, Results) and the number and type of layout elements (figures, text blocks, tables) in the corresponding posters. The retriever operates on embeddings derived from the full paper text, which encode both topical content and structural cues such as section ordering and relative lengths. To address the concern directly, we will add a new subsection in the revised manuscript that formally defines structural alignment as the degree to which retrieved layouts preserve the paper's section-to-element mapping (measured by normalized element-count correlation per section type). We will also report an isolated metric, such as the Pearson correlation between paper section statistics and retrieved poster element distributions, separate from overall embedding cosine similarity, to demonstrate that the alignment is not reducible to surface-level topic or style features. revision: yes

  2. Referee: [—] Experiments section: the reported results under constrained and unconstrained settings assert alignment and constraint satisfaction, yet the abstract and available description do not include quantitative tables, baseline comparisons, or error-bar details that would allow verification of improvement margins or robustness.

    Authors: The full experiments section contains quantitative tables reporting alignment scores (e.g., layout element matching and structural similarity) and constraint satisfaction rates for both settings, along with comparisons against non-retrieval baselines. However, we acknowledge that these details are not summarized in the abstract and that error bars and additional robustness checks could improve verifiability. In the revision we will (i) add error bars to all reported metrics, (ii) expand the baseline comparisons to include at least one additional retrieval-free generation method, and (iii) include a concise summary of the key quantitative improvements in the abstract or introduction to facilitate quick verification. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results rest on external dataset and public code

full rationale

The paper introduces the SciPostGen dataset with paired paper-poster annotations and reports empirical analyses linking paper structures to layout element counts. The Retrieval-Augmented Poster Layout Generation framework retrieves prior layouts for guidance and is evaluated under constraint conditions, with all reported alignment and generation outcomes derived from the released dataset and code. No equations, self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text that would collapse the central claims to inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions from computer vision and information retrieval: that visual layout elements can be reliably extracted from posters, that paper section structure is a sufficient proxy for layout needs, and that nearest-neighbor retrieval in embedding space yields useful guidance. No new physical constants or ad-hoc fitted scalars are introduced in the abstract.

axioms (2)
  • domain assumption Poster layouts can be decomposed into a finite set of countable visual elements whose counts correlate with paper section structure.
    Invoked when the authors state that analyses show paper structures are associated with the number of layout elements.
  • domain assumption Embedding similarity between papers is a reliable indicator of compatible poster layouts.
    Underlies the retrieval step of the proposed framework.

pith-pipeline@v0.9.0 · 5518 in / 1379 out tokens · 21752 ms · 2026-05-17T04:42:25.490073+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 2 internal anchors

  1. [1]

    Variational transformer networks for layout generation

    Diego Martin Arroyo, Janis Postels, and Federico Tombari. Variational transformer networks for layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 13642–13652, 2021. 2

  2. [2]

    Layout representation learning with spa- tial and structural hierarchies

    Yue Bai, Dipu Manandhar, Zhaowen Wang, John Collo- mosse, and Yun Fu. Layout representation learning with spa- tial and structural hierarchies. InProceedings of the AAAI Conference on Artificial Intelligence, pages 206–214, 2023. 6

  3. [3]

    Severin- sen, Christy Anna Hipsley, and Stefan Sommer

    Elizabeth Louise Baker, Gefan Yang, Michael L. Severin- sen, Christy Anna Hipsley, and Stefan Sommer. Condition- ing non-linear and infinite-dimensional diffusion processes. InProceedings of the 38th Advances in Neural Information Processing Systems, pages 10801–10826, 2024. 7

  4. [4]

    Enhancing presen- tation slide generation by LLMs with a multi-staged end- to-end approach

    Sambaran Bandyopadhyay, Himanshu Maheshwari, Anand- havelu Natarajan, and Apoorv Saxena. Enhancing presen- tation slide generation by LLMs with a multi-staged end- to-end approach. InProceedings of the 17th International Natural Language Generation Conference, pages 222–229,

  5. [5]

    Nougat: Neural Optical Understanding for Academic Documents

    Lukas Blecher, Guillem Cucurull, Thomas Scialom, and Robert Stojnic. Nougat: Neural optical understanding for academic documents. arXiv:2308.13418, 2023. 3

  6. [6]

    Lutz Bornmann and R ¨udiger Mutz. Growth rates of mod- ern science: A bibliometric analysis based on the number of publications and cited references.Journal of the Association for Information Science and Technology, 66(11):2215–2222,

  7. [7]

    Lutz Bornmann, Robin Haunschild, and R ¨udiger Mutz. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from estab- lished and new literature databases.Humanities and Social Sciences Communications, 8(1), 2021. 1

  8. [8]

    Towards aligned layout generation via diffusion model with aesthetic constraints

    Jian Chen, Ruiyi Zhang, Yufan Zhou, and Changyou Chen. Towards aligned layout generation via diffusion model with aesthetic constraints. InProceedings of the 12th Interna- tional Conference on Learning Representations, 2024. 2

  9. [9]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations. InProceedings of the 37th Interna- tional Conference on Machine Learning, pages 1597–1607,

  10. [10]

    Scientific document sum- marization via citation contextualization and scientific dis- course.International Journal on Digital Libraries, 19(2): 287–303, 2018

    Arman Cohan and Nazli Goharian. Scientific document sum- marization via citation contextualization and scientific dis- course.International Journal on Digital Libraries, 19(2): 287–303, 2018. 1

  11. [11]

    Rico: A mobile app dataset for building data- driven design applications

    Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hib- schman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ran- jitha Kumar. Rico: A mobile app dataset for building data- driven design applications. InProceedings of the 30th An- nual ACM Symposium on User Interface Software and Tech- nology, pages 845–854, 2017. 2

  12. [12]

    Can biases in imagenet models explain generalization? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22184–22194, 2024

    Paul Gavrikov and Janis Keuper. Can biases in imagenet models explain generalization? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22184–22194, 2024. 3, 2

  13. [13]

    AutoPre- sent: designing structured visuals from scratch

    Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, and Trevor Darrell. AutoPre- sent: designing structured visuals from scratch. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2902–2911, 2025. 1

  14. [14]

    Gonz ´alez, Luca Schmidt, Benjamin M

    Rita M. Gonz ´alez, Luca Schmidt, Benjamin M. Schmidt, Philipp Berens, and Dmitry Kobak. The landscape of biomedical research.Patterns, 5(6), 2024. 1

  15. [15]

    Davis, Vijay Mahadevan, and Abhinav Shrivastava

    Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry S. Davis, Vijay Mahadevan, and Abhinav Shrivastava. Layout- Transformer: layout generation and completion with self- attention. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1004–1014, 2021. 2

  16. [16]

    Retrieval-augmented layout transformer for content-aware layout generation

    Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yam- aguchi, and Kiyoharu Aizawa. Retrieval-augmented layout transformer for content-aware layout generation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 67–76, 2024. 3

  17. [17]

    PosterO: Structuring lay- out trees to enable language models in generalized content- aware layout generation

    HsiaoYuan Hsu and Yuxin Peng. PosterO: Structuring lay- out trees to enable language models in generalized content- aware layout generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8117–8127, 2025. 2, 3

  18. [18]

    PosterLayout: A new benchmark and approach for content-aware visual-textual presentation layout

    Hsiao Yuan Hsu, Xiangteng He, Yuxin Peng, Hao Kong, and Qing Zhang. PosterLayout: A new benchmark and approach for content-aware visual-textual presentation layout. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6018–6026, 2023. 2

  19. [19]

    Layoutlmv3: Pre-training for document ai with unified text and image masking

    Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. Layoutlmv3: Pre-training for document ai with unified text and image masking. InProceedings of the 30th ACM International Conference on Multimedia, pages 4083–4091,

  20. [20]

    LayoutDM: Discrete diffusion model for controllable layout generation

    Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. LayoutDM: Discrete diffusion model for controllable layout generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10167–10176, 2023. 2

  21. [21]

    Layout-Corrector: Alleviating layout sticking phenomenon in discrete diffusion model

    Shoma Iwai, Atsuki Osanai, Shunsuke Kitada, and Shinichiro Omachi. Layout-Corrector: Alleviating layout sticking phenomenon in discrete diffusion model. InPro- ceedings of the 18th European Conference on Computer Vi- sion, pages 92–110, 2024. 2

  22. [22]

    Deep submodular optimization and llm for multimodal content ex- traction and automatic poster generation from long docu- ment

    Vijay Jaisankar, Sambaran Bandyopadhyay, Kalp Vyas, Varre Suman Chaitanya, and Shwetha Somasundaram. Deep submodular optimization and llm for multimodal content ex- traction and automatic poster generation from long docu- ment. InProceedings of the AAAI Conference on Artificial Intelligence, pages 24221–24229, 2025. 1, 2

  23. [23]

    LayoutFormer++: con- 9 ditional graphic layout generation via constraint serializa- tion and decoding space restriction

    Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian- Guang Lou, and Dongmei Zhang. LayoutFormer++: con- 9 ditional graphic layout generation via constraint serializa- tion and decoding space restriction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18403–184...

  24. [24]

    Constrained graphic layout generation via latent optimization

    Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. Constrained graphic layout generation via latent optimization. InProceedings of the 29th ACM International Conference on Multimedia, page 88–96, 2021. 2, 6

  25. [25]

    Larkin and Herbert A

    Jill H. Larkin and Herbert A. Simon. Why a diagram is (sometimes) worth ten thousand words.Cognitive Science, 11(1):65–100, 1987. 2

  26. [26]

    LayoutGAN: Generating graphic layouts with wireframe discriminator

    Jianan Li, Tingfa Xu, Jianming Zhang, Aaron Hertzmann, and Jimei Yang. LayoutGAN: Generating graphic layouts with wireframe discriminator. InThe 6th International Con- ference on Learning Representations, 2019. 2

  27. [27]

    Attribute-conditioned lay- out gan for automatic graphic design.IEEE Transactions on Visualization and Computer Graphics, 27(10):4039—-4048,

    Jianan Li, Jimei Yang, Jianming Zhang, Chang Liu, Christina Wang, and Tingfa Xu. Attribute-conditioned lay- out gan for automatic graphic design.IEEE Transactions on Visualization and Computer Graphics, 27(10):4039—-4048,

  28. [28]

    Dit: Self-supervised pre-training for docu- ment image transformer

    Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, and Furu Wei. Dit: Self-supervised pre-training for docu- ment image transformer. InProceedings of the 30th ACM International Conference on Multimedia, pages 3530–3539,

  29. [29]

    Auto completion of user interface layout design using transformer-based tree decoders

    Yang Li, Julien Amelot, Xin Zhou, Samy Bengio, and Si Si. Auto completion of user interface layout design using transformer-based tree decoders. arXiv:2001.05308, 2020. 2

  30. [30]

    LayoutPrompter: awaken the design ability of large language models

    Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang James Yang, Jian-Guang Lou, and Dongmei Zhang. LayoutPrompter: awaken the design ability of large language models. InPro- ceedings of the 37th Conference on Neural Information Pro- cessing Systems, pages 43852–43879, 2023. 2, 3, 6

  31. [31]

    Lawrence Zitnick, and Piotr Doll ´ar

    Tsung Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll ´ar. Microsoft COCO: Common objects in context. InProceedings of the 13th European Conference on Computer Vision, pages 740– 755, 2014. 4

  32. [32]

    Pref- erence optimization for molecule synthesis with conditional residual energy-based models

    Songtao Liu, Hanjun Dai, Yue Zhao, and Peng Liu. Pref- erence optimization for molecule synthesis with conditional residual energy-based models. InProceedings of the 41st In- ternational Conference on Machine Learning, pages 30929– 30945, 2024. 7

  33. [33]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProceedings of the 7th International Con- ference on Learning Representations, 2019. 3

  34. [34]

    Learn- ing structural similarity of user interface layouts using graph networks

    Dipu Manandhar, Dan Ruta, and John Collomosse. Learn- ing structural similarity of user interface layouts using graph networks. InProceedings of the 16th European Conference on Computer Vision, pages 730–746, 2020. 6

  35. [35]

    SCAF- FLSA: Taming heterogeneity in federated linear stochastic approximation and td learning

    Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, and Eric Moulines. SCAF- FLSA: Taming heterogeneity in federated linear stochastic approximation and td learning. InProceedings of the 38th Advances in Neural Information Processing Systems, pages 13927–13981, 2024. 8

  36. [36]

    Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents

    Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd- Graber. Presentations by the humans and for the humans: Harnessing LLMs for generating persona-aware slides from documents. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Lin- guistics, pages 2664–2...

  37. [37]

    Nelson, Valerie S

    Douglas L. Nelson, Valerie S. Reed, and John R. Walling. Pictorial superiority effect.Journal of Experimental Psy- chology: Human Learning and Memory, 2(5):523–528,

  38. [38]

    GPT-5 system card

    OpenAI. GPT-5 system card. https://cdn.openai.com/gpt-5- system-card.pdf, 2025. 2, 6

  39. [39]

    LTSim: layout transportation-based similarity mea- sure for evaluating layout generation

    Mayu Otani, Naoto Inoue, Kotaro Kikuchi, and Riku To- gashi. LTSim: layout transportation-based similarity mea- sure for evaluating layout generation. arXiv:2407.12356,

  40. [40]

    Paper2Poster: benchmarking multimodal poster generation from long-context papers

    Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, and Philip Torr. Paper2Poster: benchmarking multimodal poster generation from long-context papers. InProceedings of the 39th Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2025. 1, 2

  41. [41]

    Kanya Paramita and Leylia M. Khodra. Tailored summary for automatic poster generator. InProceedings of the 2016 International Conference On Advanced Informatics: Con- cepts, Theory And Application, pages 1–6, 2016. 1

  42. [42]

    Fair-VPT: Fair visual prompt tuning for image classification

    Sungho Park and Hyeran Byun. Fair-VPT: Fair visual prompt tuning for image classification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12268–12278, 2024. 7

  43. [43]

    Bigham, and Amy Pavel

    Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Xin Yue Li, Jeffrey P. Bigham, and Amy Pavel. DreamStruct: Under- standing slides and user interfaces via synthetic data gener- ation. InProceedings of the 18th European Conference on Computer Vision, pages 466–485, 2024. 6

  44. [44]

    HDQMF: Holographic feature decomposition using quantum algorithms

    Prathyush Prasanth Poduval, Zhuowen Zou, and Mohsen Imani. HDQMF: Holographic feature decomposition using quantum algorithms. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10978–10987, 2024. 8

  45. [45]

    Learning to generate posters of scientific pa- pers

    Yuting Qiang, Yanwei Fu, Yanwen Guo, Zhi-Hua Zhou, and Leonid Sigal. Learning to generate posters of scientific pa- pers. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 51–57, 2016. 1, 2

  46. [46]

    Learning to generate posters of scientific papers by probabilistic graphical models.Journal of Computer Science and Technology, 34(1):155–169, 2019

    Yu-Ting Qiang, Yan-Wei Fu, Xiao Yu, Yan-Wen Guo, Zhi- Hua Zhou, and Leonid Sigal. Learning to generate posters of scientific papers by probabilistic graphical models.Journal of Computer Science and Technology, 34(1):155–169, 2019. 1, 2

  47. [47]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763, 2021. 2, 5

  48. [48]

    You Only Look Once: Unified, real-time object de- 10 tection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You Only Look Once: Unified, real-time object de- 10 tection. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016. 4

  49. [49]

    arXiv preprint arXiv:2502.17540

    Rohit Saxena, Pasquale Minervini, and Frank Keller. Poster- Sum: a multimodal benchmark for scientific poster summa- rization. arXiv:2502.17540, 2025. 1, 2

  50. [50]

    PosterLlama: Bridging design ability of language model to content-aware layout generation

    Jaejung Seol, Seojun Kim, and Jaejun Yoo. PosterLlama: Bridging design ability of language model to content-aware layout generation. InProceedings of the 18th European Con- ference on Computer Vision, pages 451–468, 2024. 2, 3, 6

  51. [51]

    How far can fairness constraints help recover from biased data? InPro- ceedings of the 41st International Conference on Machine Learning, pages 44515–44544, 2024

    Mohit Sharma and Amit Jayant Deshpande. How far can fairness constraints help recover from biased data? InPro- ceedings of the 41st International Conference on Machine Learning, pages 44515–44544, 2024. 7

  52. [52]

    arXiv preprint arXiv:2505.17104

    Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, and Zhoujun Li. P2P: Automated paper-to-poster generation and fine-grained benchmark. arXiv:2505.17104,

  53. [53]

    SciPost- Layout: a dataset for layout analysis and layout generation of scientific posters

    Shohei Tanaka, Hao Wang, and Yoshitaka Ushiku. SciPost- Layout: a dataset for layout analysis and layout generation of scientific posters. InProceedings of the 35th British Machine Vision Conference, 2024. 2, 3, 6, 1

  54. [54]

    Optimizing watermarks for large language models

    Bram Wouters. Optimizing watermarks for large language models. InProceedings of the 41st International Conference on Machine Learning, pages 53251–53269, 2024. 7

  55. [55]

    LayoutRAG: Retrieval-augmented model for content-agnostic conditional layout generation

    Yuxuan Wu, Le Wang, Sanping Zhou, Mengnan Liu, Gang Hua, and Haoxiang Li. LayoutRAG: Retrieval-augmented model for content-agnostic conditional layout generation. arXiv:2506.02697, 2025. 3

  56. [56]

    PosterBot: a system for gener- ating posters of scientific papers with neural models

    Sheng Xu and Xiaojun Wan. PosterBot: a system for gener- ating posters of scientific papers with neural models. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 13233–13235, 2022. 1, 2

  57. [57]

    Visual summary identification from scientific publications via self- supervised learning.Frontiers in Research Metrics and An- alytics, 6, 2021

    Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glava ˇs, and Shigeo Morishima. Visual summary identification from scientific publications via self- supervised learning.Frontiers in Research Metrics and An- alytics, 6, 2021. 1

  58. [58]

    Text prompt with nor- mality guidance for weakly supervised video anomaly detec- tion

    Zhiwei Yang, Jing Liu, and Peng Wu. Text prompt with nor- mality guidance for weakly supervised video anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 18899–18908,

  59. [59]

    Metaxas, Sergey Tulyakov, and Jian Ren

    Zhixing Zhang, Yanyu Li, Yushu Wu, yanwu xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris N. Metaxas, Sergey Tulyakov, and Jian Ren. SF-V: Single forward video generation model. InPro- ceedings of the 38th Annual Conference on Neural Informa- tion Processing Systems, pages 103599–103618, 2024. 8

  60. [60]

    PosterGen: Aesthetic-Aware Multi-Modal Paper-to-Poster Generation via Multi-Agent LLMs

    Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, and Chenyu You. PosterGen: Aesthetic-aware paper-to-poster generation via multi-agent llms. arXiv:2508.17188, 2025. 1, 2

  61. [61]

    Pub- LayNet: largest dataset ever for document layout analysis

    Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. Pub- LayNet: largest dataset ever for document layout analysis. In Proceedings of the 2019 International Conference on Docu- ment Analysis and Recognition, pages 1015–1022, 2019. 2

  62. [62]

    Scientific poster generation: A new dataset and approach.Pattern Recognition, 164(C),

    Xinyi Zhong, Zusheng Tan, Jing Li, Shen Gao, Jing Ma, Shanshan Feng, and Billy Chiu. Scientific poster generation: A new dataset and approach.Pattern Recognition, 164(C),

  63. [63]

    Author Info

    2 11 SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts Supplementary Material A. Dataset Details Overview.Figure 8 illustrates an example of the an- notated components in SciPostGen, a dataset comprising 18,097 pairs of scientific papers and their corresponding posters. In the main text, we focused on the paper content an- notation...