PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

Hyunjung Shim; Jiho Choi; Seojeong Park; Seongjong Song

arxiv: 2508.21720 · v2 · submitted 2025-08-29 · 💻 cs.AI

PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

Jiho Choi , Seojeong Park , Seongjong Song , Hyunjung Shim This is my paper

Pith reviewed 2026-05-18 20:17 UTC · model grok-4.3

classification 💻 cs.AI

keywords scientific poster generationmulti-agent collaborationhierarchical reasoningtraining-free frameworkdocument hierarchycontent layout planningrecursive refinement

0 comments

The pith

PosterForest generates scientific posters from raw documents by structuring them as a Poster Tree and refining content and layout through hierarchical multi-agent reasoning without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PosterForest as a training-free way to turn scientific documents into posters. It creates a Poster Tree to hold the document's hierarchy and both visual and textual details at different scales. Content agents and layout agents then work together with recursive refinement, starting from overall structure and moving down to specific sections. This joint process is intended to keep more information, maintain logical connections, and achieve better visual spacing than methods that summarize flatly or handle content and layout in isolation. If successful, it would let researchers produce usable posters directly from papers without extra data or model adjustments.

Core claim

PosterForest introduces the Poster Tree as a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony, and experiments show it outperforms prior methods in both automatic and human evaluations without additional training or domain-specific supervision.

What carries the argument

The Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels, which allows content and layout agents to carry out hierarchical reasoning and recursive refinement from global to local scales.

If this is right

Generated posters retain more document information and show stronger logical connections than those from flat summarization techniques.
Joint optimization by content and layout agents produces better visual harmony than optimizing each aspect separately.
The method delivers measurable gains in automatic metrics and human preference scores across tested documents.
No task-specific training data or domain supervision is required for the framework to function on scientific material.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tree-based hierarchy could be tested for turning documents into other visual formats such as slide decks or summary infographics.
Researchers preparing conference submissions might integrate the system to speed up initial poster drafts before manual polishing.
Extending the agents to include image-generation steps could address gaps in visual element quality that the current text-and-layout focus leaves open.

Load-bearing premise

The Poster Tree representation together with hierarchical reasoning by separate content and layout agents is enough to produce coherent and well-balanced posters from raw documents without task-specific training or supervision.

What would settle it

Evaluating PosterForest on a new collection of scientific papers and finding that human raters score the outputs lower than baseline methods on measures of logical flow or visual balance would show the hierarchical approach does not deliver the claimed gains.

Figures

Figures reproduced from arXiv: 2508.21720 by Hyunjung Shim, Jiho Choi, Seojeong Park, Seongjong Song.

**Figure 1.** Figure 1: Limitations of Current SPG Methods. Existing state-of-the-art scientific poster generation (SPG) methods, including P2P (Sun et al. 2025) and Paper2Poster (Pang et al. 2025), lack hierarchical document understanding, resulting in errors in both content and layout. (a) shows an example where an experiment table is incorrectly placed in the conclusion section. (b) illustrates an overly simplified poster, wh… view at source ↗

**Figure 2.** Figure 2: Overview of PosterForest. pers. These approaches serve as effective baselines by leveraging multimodal large language models (MLLMs, e.g., GPT-4 (Achiam et al. 2023), Gemini (Team et al. 2023), Qwen (Bai et al. 2023)) to extract textual and visual content from papers, summarize key information, and arrange it within a panel layout. Among them, Paper2Poster adopts a modular approach comprised of (a) a par… view at source ↗

**Figure 3.** Figure 3: Modification Planning. The Poster Tree and layout are iteratively updated through the shared decision of layout and Content Agent. 3.3.2 Iterative Tree Refinement (Tree-level) Following node-level modifications, the system conducts iterative tree traversal to apply changes across the entire Poster Tree. Starting from the root, the system visits each node sequentially, often in a breadth-first manner, and… view at source ↗

**Figure 4.** Figure 4: Qualitative Comparison. Posters generated with the GPT-4o framework of baseline methods and PosterForest, based on papers spanning different AI fields (computer vision, NLP, RL), along with the original posters (GT) created by the authors. Method Content Esthetics Structure Overall 4o-HTML 2.3 % 1.8 % 2.7 % 1.8 % P2P 10.5 % 22.3 % 13.6 % 13.2 % Paper2Poster 34.1 % 24.1 % 25.0 % 26.9 % Ours 53.2 % 51.8 % 58… view at source ↗

**Figure 5.** Figure 5: Ablation Study on the Effect of Content and Layout Agents. Root |-- Section 1 |-- Subsection 1.1 |-- Section 2 |-- Subsection 2.1 |-- Subsection 2.2 Root |-- Section 1 | ‘-- Subsection 1.1 ‘-- Section 2 | |-- Subsection 2.1 | ‘-- Subsection 2.2 (a) w/o Hierarchical (b) w/ Hierarchical [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation Study on the Effect of Hierarchical Content Tree. 4.4 User Study To conduct a user study to evaluate poster quality from a human perspective, we recruited 22 participants, all of whom were graduate students (master’s level or above) and had participated in scientific conferences. The study uses 10 sets, each consisting of a group of posters and four evaluation questions. Each poster group is gene… view at source ↗

read the original abstract

Automating scientific poster generation requires hierarchical document understanding and coherent content-layout planning. Existing methods often rely on flat summarization or optimize content and layout separately. As a result, they often suffer from information loss, weak logical flow, and poor visual balance. We present PosterForest, a training-free framework for scientific poster generation. Our method introduces the Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Building on this representation, content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony. Experiments show that PosterForest outperforms prior methods in both automatic and human evaluations, without additional training or domain-specific supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PosterForest introduces a Poster Tree with recursive content and layout agents for training-free poster generation, but the experiments lack ablations and details needed to credit the hierarchy for the gains.

read the letter

The main point is a training-free framework that turns a paper into a Poster Tree and then runs content and layout agents through recursive refinement to produce the final poster. This joint handling of semantics and visuals is the core new piece, and it directly targets the information loss and poor flow that come from flat summarization or separate optimization steps. The paper does a clear job explaining why existing approaches fall short on logical structure and visual balance, and the recursive top-down to bottom-up process is a sensible way to keep global organization while fixing local details. That framing is useful even if the implementation details are still high-level in the abstract. The training-free claim is also straightforward and avoids the usual data-hungry pitfalls in this area. On the soft spots, the experimental support is thin. The abstract states outperformance on automatic and human evaluations, yet gives no concrete baselines, metrics, statistical tests, or dataset description. More importantly, there is no ablation that isolates the Poster Tree and hierarchy from a simpler multi-agent or structured-prompt baseline on the same inputs. Without that control, it is difficult to know whether the reported improvements come from the claimed structure or just from using capable agents in the first place. That is a real gap for a central claim. This work is aimed at people building tools for scientific communication or exploring hierarchical multi-agent planning for creative tasks. A reader interested in document understanding or practical LLM applications could pick up the Poster Tree idea and try it in related settings. It deserves a serious referee because the problem is practical, the proposed structure is well-motivated, and the training-free constraint is honest. I would send it for review but ask the authors to add the missing ablations and full experimental reporting before acceptance.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PosterForest, a training-free framework for scientific poster generation from raw documents. It introduces the Poster Tree as a hierarchical intermediate representation capturing document structure and visual-textual semantics at multiple levels. Content and layout agents then perform hierarchical reasoning with recursive refinement to jointly optimize global organization and local composition. The central claim is that this yields better semantic coherence, logical flow, and visual balance than prior flat-summarization or separately-optimized methods, with experiments showing outperformance in both automatic and human evaluations without any task-specific training or supervision.

Significance. If the empirical results and attribution to the Poster Tree plus hierarchy hold, the work would represent a meaningful step toward automated scientific communication tools that preserve document structure without domain-specific fine-tuning. The training-free design and explicit joint content-layout optimization are clear strengths that could broaden applicability. The introduction of the Poster Tree as a structured representation is a concrete technical contribution worth noting.

major comments (2)

[Abstract / Experiments] Abstract and Experiments: the claim that PosterForest 'outperforms prior methods in both automatic and human evaluations' is load-bearing for the paper's contribution, yet the manuscript supplies no details on baselines, concrete metrics (e.g., what automatic scores measure coherence or balance?), dataset construction, number of test documents, or statistical tests. This leaves the central empirical assertion without visible support and prevents readers from assessing whether gains are reliable.
[Method / Experiments] Method and Ablation: the framework's novelty rests on the Poster Tree representation together with hierarchical content/layout agents and recursive refinement. No ablation is reported that removes the tree (e.g., flat single-pass prompting on identical documents and LLM) or compares against a non-recursive multi-agent baseline. Without such a control, it remains unclear whether reported improvements in coherence and balance are driven by the claimed hierarchical structure or simply by structured multi-agent prompting in general.

minor comments (2)

[Method] Clarify the precise recursive refinement procedure and termination criteria for the agents; a short pseudocode or numbered step list would improve reproducibility.
[Experiments] Ensure all automatic metrics are defined with explicit formulas or references in the evaluation section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the empirical support and the attribution of our contributions. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments: the claim that PosterForest 'outperforms prior methods in both automatic and human evaluations' is load-bearing for the paper's contribution, yet the manuscript supplies no details on baselines, concrete metrics (e.g., what automatic scores measure coherence or balance?), dataset construction, number of test documents, or statistical tests. This leaves the central empirical assertion without visible support and prevents readers from assessing whether gains are reliable.

Authors: We agree that the current manuscript would benefit from greater explicitness in these areas to allow readers to fully evaluate the results. We will revise the Experiments section (and update the abstract if space permits) to include a clear enumeration of all baselines, precise definitions of the automatic metrics with explanations of how they quantify semantic coherence, logical flow, and visual balance, details on dataset construction and the number of test documents, and results of statistical significance tests such as paired t-tests with p-values. revision: yes
Referee: [Method / Experiments] Method and Ablation: the framework's novelty rests on the Poster Tree representation together with hierarchical content/layout agents and recursive refinement. No ablation is reported that removes the tree (e.g., flat single-pass prompting on identical documents and LLM) or compares against a non-recursive multi-agent baseline. Without such a control, it remains unclear whether reported improvements in coherence and balance are driven by the claimed hierarchical structure or simply by structured multi-agent prompting in general.

Authors: We acknowledge that an explicit ablation isolating the Poster Tree and recursive hierarchy would strengthen the case for our specific design choices. The current experiments compare against prior flat or separately-optimized methods but do not include an internal control using flat single-pass prompting or a non-recursive multi-agent setup on identical documents and the same LLM. We will add these ablations in the revised manuscript to directly attribute performance gains to the hierarchical components. revision: yes

Circularity Check

0 steps flagged

No circularity: training-free framework with novel components and no equations or fitted predictions

full rationale

The paper presents PosterForest as a training-free multi-agent framework that introduces the Poster Tree representation and hierarchical content/layout agents for recursive refinement. The abstract and provided text contain no equations, no parameter fitting to data subsets, and no predictions that reduce to inputs by construction. Claims of outperformance rest on experimental evaluations rather than any self-referential derivation or self-citation chain that would make the central result tautological. The derivation chain is self-contained as a descriptive system design without load-bearing reductions to prior fitted quantities or author-specific uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the newly introduced Poster Tree and the assumption that untrained agents can perform useful hierarchical reasoning and refinement.

axioms (1)

domain assumption Multi-agent systems can perform effective hierarchical reasoning and recursive refinement for document understanding without task-specific training.
Invoked when the abstract states that content and layout agents progressively optimize from global to local composition.

invented entities (1)

Poster Tree no independent evidence
purpose: Structured intermediate representation capturing document hierarchy and visual-textual semantics across multiple levels.
Newly introduced as the core representation enabling the agent collaboration.

pith-pipeline@v0.9.0 · 5659 in / 1243 out tokens · 35696 ms · 2026-05-18T20:17:53.537213+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AI for Auto-Research: Roadmap & User Guide
cs.AI 2026-05 unverdicted novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper · 11 internal anchors

[1]

GPT-4 Technical Report

Gpt-4 technical report. arXiv preprint arXiv:2303.08774. Bai, J.; Bai, S.; Chu, Y .; Cui, Z.; Dang, K.; Deng, X.; Fan, Y .; Ge, W.; Han, Y .; Huang, F.; et al

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Qwen Technical Report

Qwen technical report. arXiv preprint arXiv:2309.16609. Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L.; Gajda, J.; Lehmann, T.; Niewiadomski, H.; Nyczyk, P.; et al

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Program of thoughts prompting: Disentangling computa- tion from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588. Du, Y .; Li, S.; Torralba, A.; Tenenbaum, J. B.; and Mordatch, I

work page internal anchor Pith review Pith/arXiv arXiv
[4]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 3(4):

work page internal anchor Pith review Pith/arXiv arXiv
[5]

arXiv preprint arXiv:2505.23885

Owl: Optimized work- force learning for general multi-agent assistance in real- world task automation. arXiv preprint arXiv:2505.23885. Huang, Y .; Lv, T.; Cui, L.; Lu, Y .; and Wei, F

work page arXiv
[6]

arXiv preprint arXiv:2405.20213

Postdoc: Generating poster from a long multimodal document using deep submodular optimization. arXiv preprint arXiv:2405.20213. Larsen, P.; and V on Ins, M

work page arXiv
[7]

In Findings of the association for computational linguistics ACL 2024, 11286–11315

Prometheus-vision: Vision-language model as a judge for fine-grained evaluation. In Findings of the association for computational linguistics ACL 2024, 11286–11315. Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; and Ghanem, B

work page 2024
[8]

Chain of ideas: Revolutionizing research via novel idea development with llm agents

Chain of ideas: Revolutionizing research via novel idea development with llm agents. arXiv preprint arXiv:2410.13185. Liang, T.; He, Z.; Jiao, W.; Wang, X.; Wang, Y .; Wang, R.; Yang, Y .; Shi, S.; and Tu, Z

work page arXiv
[9]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Encouraging divergent thinking in large language models through multi-agent de- bate. arXiv preprint arXiv:2305.19118. Lin, J.; Guo, J.; Sun, S.; Yang, Z.; Lou, J.-G.; and Zhang, D

work page internal anchor Pith review Pith/arXiv arXiv
[10]

In Proceedings of the 2004 conference on empiri- cal methods in natural language processing, 404–411

Textrank: Bringing order into text. In Proceedings of the 2004 conference on empiri- cal methods in natural language processing, 404–411. Pang, W.; Lin, K. Q.; Jian, X.; He, X.; and Torr, P

work page 2004
[11]

arXiv preprint arXiv:2505.21497

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers. arXiv preprint arXiv:2505.21497. Qian, C.; Liu, W.; Liu, H.; Chen, N.; Dang, Y .; Li, J.; Yang, C.; Chen, W.; Su, Y .; Cong, X.; et al

work page arXiv
[12]

ChatDev: Communicative Agents for Software Development

Chatdev: Com- municative agents for software development. arXiv preprint arXiv:2307.07924. Qiang, Y .; Fu, Y .; Guo, Y .; Zhou, Z.-H.; and Sigal, L

work page internal anchor Pith review Pith/arXiv arXiv
[13]

arXiv preprint arXiv:2502.17540

Postersum: A multimodal benchmark for scientific poster summarization. arXiv preprint arXiv:2502.17540. Seo, M.; Baek, J.; Lee, S.; and Hwang, S. J

work page arXiv
[14]

Paper2code: Automating code generation from scientific papers in machine learning.ArXiv, abs/2504.17192,

Pa- per2code: Automating code generation from scientific pa- pers in machine learning. arXiv preprint arXiv:2504.17192. Shao, Z.; Wang, P.; Zhu, Q.; Xu, R.; Song, J.; Bi, X.; Zhang, H.; Zhang, M.; Li, Y .; Wu, Y .; et al

work page arXiv
[15]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open lan- guage models. arXiv preprint arXiv:2402.03300. Sun, T.; Pan, E.; Yang, Z.; Sui, K.; Shi, J.; Cheng, X.; Li, T.; Huang, W.; Zhang, G.; Yang, J.; et al

work page internal anchor Pith review Pith/arXiv arXiv
[16]

arXiv preprint arXiv:2505.17104

P2P: Auto- mated Paper-to-Poster Generation and Fine-Grained Bench- mark. arXiv preprint arXiv:2505.17104. Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.-B.; Yu, J.; Sori- cut, R.; Schalkwyk, J.; Dai, A. M.; Hauth, A.; Millican, K.; et al

work page arXiv
[17]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Team, G.; Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupati- raju, S.; Pathak, S.; Sifre, L.; Rivi`ere, M.; Kale, M. S.; Love, J.; et al

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Gemma: Open Models Based on Gemini Research and Technology

Gemma: Open models based on gemini re- search and technology. arXiv preprint arXiv:2403.08295. Tran, K.-T.; Dao, D.; Nguyen, M.-D.; Pham, Q.-V .; O’Sullivan, B.; and Nguyen, H. D

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Multi-agent col- laboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322. Wang, H.; Tanaka, S.; and Ushiku, Y

work page internal anchor Pith review Pith/arXiv arXiv
[20]

arXiv preprint arXiv:2112.08550

Neural content extraction for poster generation of scientific papers. arXiv preprint arXiv:2112.08550. Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y .; and Narasimhan, K

work page arXiv
[21]

arXiv preprint arXiv:2412.17767

Researchtown: Simulator of hu- man research community. arXiv preprint arXiv:2412.17767. Zhang, Y .; Sun, R.; Chen, Y .; Pfister, T.; Zhang, R.; and Arik, S. 2024a. Chain of agents: Large language models collabo- rating on long-context tasks. Advances in Neural Informa- tion Processing Systems, 37: 132208–132237. Zhang, Y .; Zhang, Z.; Lai, W.; Zhang, C.; ...

work page arXiv 2024
[22]

Multimodal Chain-of-Thought Reasoning in Language Models

Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923. Zhao, Z.; Kang, H.; Wang, B.; and He, C

work page internal anchor Pith review Pith/arXiv arXiv
[23]

DocLayout-YOLO: Enhancing document layout analysis through diverse synthetic data and global-to-local adaptive perception

Doclayout- yolo: Enhancing document layout analysis through diverse synthetic data and global-to-local adaptive perception.arXiv preprint arXiv:2410.12628. Zheng, G.; Zhou, X.; Li, X.; Qi, Z.; Shan, Y .; and Li, X

work page arXiv
[24]

ID": "2401.13641

Pptagent: Gen- erating and evaluating presentations beyond text-to-slides. arXiv preprint arXiv:2501.03936. Supplementary Material A Additional Qualitative Results Additional results for scientific poster generation are pre- sented in Figure A1 and Figure A2. B Experimental Details B.1 Qualitative Experiments Setup Standardization. To ensure a fair compar...

work page arXiv 2025
[25]

Papers were filtered to in- clude the latest camera-ready versions and to ensure a broad distribution across years and venues

was used when curating the benchmark. Papers were filtered to in- clude the latest camera-ready versions and to ensure a broad distribution across years and venues. This benchmark pro- vides a challenging setting for evaluating poster generation methods in terms of both informativeness and layout quality. B.3 Implementation Details All experiments were co...

work page 2024

[1] [1]

GPT-4 Technical Report

Gpt-4 technical report. arXiv preprint arXiv:2303.08774. Bai, J.; Bai, S.; Chu, Y .; Cui, Z.; Dang, K.; Deng, X.; Fan, Y .; Ge, W.; Han, Y .; Huang, F.; et al

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Qwen Technical Report

Qwen technical report. arXiv preprint arXiv:2309.16609. Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L.; Gajda, J.; Lehmann, T.; Niewiadomski, H.; Nyczyk, P.; et al

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Program of thoughts prompting: Disentangling computa- tion from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588. Du, Y .; Li, S.; Torralba, A.; Tenenbaum, J. B.; and Mordatch, I

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 3(4):

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

arXiv preprint arXiv:2505.23885

Owl: Optimized work- force learning for general multi-agent assistance in real- world task automation. arXiv preprint arXiv:2505.23885. Huang, Y .; Lv, T.; Cui, L.; Lu, Y .; and Wei, F

work page arXiv

[6] [6]

arXiv preprint arXiv:2405.20213

Postdoc: Generating poster from a long multimodal document using deep submodular optimization. arXiv preprint arXiv:2405.20213. Larsen, P.; and V on Ins, M

work page arXiv

[7] [7]

In Findings of the association for computational linguistics ACL 2024, 11286–11315

Prometheus-vision: Vision-language model as a judge for fine-grained evaluation. In Findings of the association for computational linguistics ACL 2024, 11286–11315. Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; and Ghanem, B

work page 2024

[8] [8]

Chain of ideas: Revolutionizing research via novel idea development with llm agents

Chain of ideas: Revolutionizing research via novel idea development with llm agents. arXiv preprint arXiv:2410.13185. Liang, T.; He, Z.; Jiao, W.; Wang, X.; Wang, Y .; Wang, R.; Yang, Y .; Shi, S.; and Tu, Z

work page arXiv

[9] [9]

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Encouraging divergent thinking in large language models through multi-agent de- bate. arXiv preprint arXiv:2305.19118. Lin, J.; Guo, J.; Sun, S.; Yang, Z.; Lou, J.-G.; and Zhang, D

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

In Proceedings of the 2004 conference on empiri- cal methods in natural language processing, 404–411

Textrank: Bringing order into text. In Proceedings of the 2004 conference on empiri- cal methods in natural language processing, 404–411. Pang, W.; Lin, K. Q.; Jian, X.; He, X.; and Torr, P

work page 2004

[11] [11]

arXiv preprint arXiv:2505.21497

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers. arXiv preprint arXiv:2505.21497. Qian, C.; Liu, W.; Liu, H.; Chen, N.; Dang, Y .; Li, J.; Yang, C.; Chen, W.; Su, Y .; Cong, X.; et al

work page arXiv

[12] [12]

ChatDev: Communicative Agents for Software Development

Chatdev: Com- municative agents for software development. arXiv preprint arXiv:2307.07924. Qiang, Y .; Fu, Y .; Guo, Y .; Zhou, Z.-H.; and Sigal, L

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

arXiv preprint arXiv:2502.17540

Postersum: A multimodal benchmark for scientific poster summarization. arXiv preprint arXiv:2502.17540. Seo, M.; Baek, J.; Lee, S.; and Hwang, S. J

work page arXiv

[14] [14]

Paper2code: Automating code generation from scientific papers in machine learning.ArXiv, abs/2504.17192,

Pa- per2code: Automating code generation from scientific pa- pers in machine learning. arXiv preprint arXiv:2504.17192. Shao, Z.; Wang, P.; Zhu, Q.; Xu, R.; Song, J.; Bi, X.; Zhang, H.; Zhang, M.; Li, Y .; Wu, Y .; et al

work page arXiv

[15] [15]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open lan- guage models. arXiv preprint arXiv:2402.03300. Sun, T.; Pan, E.; Yang, Z.; Sui, K.; Shi, J.; Cheng, X.; Li, T.; Huang, W.; Zhang, G.; Yang, J.; et al

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

arXiv preprint arXiv:2505.17104

P2P: Auto- mated Paper-to-Poster Generation and Fine-Grained Bench- mark. arXiv preprint arXiv:2505.17104. Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.-B.; Yu, J.; Sori- cut, R.; Schalkwyk, J.; Dai, A. M.; Hauth, A.; Millican, K.; et al

work page arXiv

[17] [17]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Team, G.; Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupati- raju, S.; Pathak, S.; Sifre, L.; Rivi`ere, M.; Kale, M. S.; Love, J.; et al

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Gemma: Open Models Based on Gemini Research and Technology

Gemma: Open models based on gemini re- search and technology. arXiv preprint arXiv:2403.08295. Tran, K.-T.; Dao, D.; Nguyen, M.-D.; Pham, Q.-V .; O’Sullivan, B.; and Nguyen, H. D

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Multi-agent col- laboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322. Wang, H.; Tanaka, S.; and Ushiku, Y

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

arXiv preprint arXiv:2112.08550

Neural content extraction for poster generation of scientific papers. arXiv preprint arXiv:2112.08550. Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y .; and Narasimhan, K

work page arXiv

[21] [21]

arXiv preprint arXiv:2412.17767

Researchtown: Simulator of hu- man research community. arXiv preprint arXiv:2412.17767. Zhang, Y .; Sun, R.; Chen, Y .; Pfister, T.; Zhang, R.; and Arik, S. 2024a. Chain of agents: Large language models collabo- rating on long-context tasks. Advances in Neural Informa- tion Processing Systems, 37: 132208–132237. Zhang, Y .; Zhang, Z.; Lai, W.; Zhang, C.; ...

work page arXiv 2024

[22] [22]

Multimodal Chain-of-Thought Reasoning in Language Models

Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923. Zhao, Z.; Kang, H.; Wang, B.; and He, C

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

DocLayout-YOLO: Enhancing document layout analysis through diverse synthetic data and global-to-local adaptive perception

Doclayout- yolo: Enhancing document layout analysis through diverse synthetic data and global-to-local adaptive perception.arXiv preprint arXiv:2410.12628. Zheng, G.; Zhou, X.; Li, X.; Qi, Z.; Shan, Y .; and Li, X

work page arXiv

[24] [24]

ID": "2401.13641

Pptagent: Gen- erating and evaluating presentations beyond text-to-slides. arXiv preprint arXiv:2501.03936. Supplementary Material A Additional Qualitative Results Additional results for scientific poster generation are pre- sented in Figure A1 and Figure A2. B Experimental Details B.1 Qualitative Experiments Setup Standardization. To ensure a fair compar...

work page arXiv 2025

[25] [25]

Papers were filtered to in- clude the latest camera-ready versions and to ensure a broad distribution across years and venues

was used when curating the benchmark. Papers were filtered to in- clude the latest camera-ready versions and to ensure a broad distribution across years and venues. This benchmark pro- vides a challenging setting for evaluating poster generation methods in terms of both informativeness and layout quality. B.3 Implementation Details All experiments were co...

work page 2024