When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS

Boye Niu; David Hsu; Harold Soh; Wee Sun Lee; Zirui Zhao

arxiv: 2512.01242 · v3 · pith:6M6GD7VDnew · submitted 2025-12-01 · 💻 cs.CV · cs.AI· cs.CL

When Diffusion Breaks Constraints: Sequential Autoregressive Generation with RL and MCTS

Zirui Zhao , Boye Niu , Harold Soh , David Hsu , Wee Sun Lee This is my paper

Pith reviewed 2026-05-17 03:40 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CL

keywords diffusion modelsconstrained generationautoregressive generationreinforcement learningMonte Carlo tree searchtangram generationgeometric constraintsplanning and design tasks

0 comments

The pith

Diffusion models fail to satisfy strict geometric constraints in planning tasks because continuous density matching cannot target low-dimensional feasible regions, while reformulating generation as sequential discrete choices with RL and M-

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines why diffusion models produce invalid outputs in constrained tasks such as assembling tangram pieces into a language-described silhouette without overlaps or disconnections. Experiments on tangram generation and a simplified rectangle composition task with a learned bounding-box constraint show that diffusion struggles even with guidance or projection, as feasible solutions occupy small low-dimensional areas of the output space. The authors reformulate the problem as building outputs step by step through discrete autoregressive decisions. Reinforcement learning then trains policies to favor feasible configurations, and Monte Carlo tree search measures how much planning ahead helps when feasible regions become smaller. The combined evidence indicates a structural limitation for continuous density matching on this class of problems and points to sequential constraint-aware generation as an alternative.

Core claim

Diffusion models struggle to satisfy constraints in tasks such as tangram generation from language and rectangle composition with bounding-box constraints, consistent with difficulty generating samples near low-dimensional submanifolds. Reformulating constrained generation as discrete autoregressive sequential generation allows reinforcement learning to improve feasibility and task success, and Monte Carlo tree search to quantify the value of look-ahead when feasible regions shrink. The empirical, theoretical, and prior-work evidence points to a structural limitation of continuous density matching on this class of constrained-generation problems.

What carries the argument

Reformulation of constrained generation as discrete autoregressive sequential generation, where reinforcement learning improves feasibility and Monte Carlo tree search quantifies the value of look-ahead.

If this is right

Diffusion models will continue to exhibit severe constraint violations in engineering inverse design, molecular generation, multi-robot planning, and floorplan synthesis even when projection or guidance is applied.
Sequential autoregressive methods achieve higher rates of feasible and task-successful outputs on problems that require non-overlap, connectivity, and other geometric constraints.
The benefit of Monte Carlo tree search increases as the size of feasible regions shrinks in constrained tasks.
Locally feasible reparameterizations provide the motivation for moving from continuous density matching to discrete sequential modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sequential RL approach could be applied directly to molecular generation to enforce physical constraints that diffusion models routinely violate.
Hybrid pipelines might first use diffusion to propose creative starting points and then switch to sequential refinement to meet hard constraints in scene or floorplan synthesis.
Evaluating the method on multi-robot coordination tasks would test whether the tangram results generalize to problems involving dynamic spatial constraints.

Load-bearing premise

The failure modes observed in tangram generation from language and the simplified rectangle composition task are representative of the broader class of constrained planning and design tasks such as engineering inverse design, molecular generation, and multi-robot planning.

What would settle it

A controlled experiment in which diffusion models with guidance or projection achieve feasibility rates on tangram or molecular generation tasks that match or exceed those of the sequential RL-plus-MCTS method would falsify the structural-limitation claim.

Figures

Figures reproduced from arXiv: 2512.01242 by Boye Niu, David Hsu, Harold Soh, Wee Sun Lee, Zirui Zhao.

**Figure 3.** Figure 3: Demonstration of the proposed Generative Adversarial [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: The demo of the actions in Tangram assembly. (a) The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: The policy value network. We use a pre-trained Vision [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The demonstration of the dataset and the generated tan [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: One example question in the questionnaire. We ran [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

Data-driven generative models excel in language and vision, but diffusion models often fail in constrained planning and design tasks, exhibiting severe constraint violations in engineering inverse design, molecular generation, multi-robot planning, and floorplan/scene synthesis even with projection or guidance. Such tasks combine hard-to-specify semantic goals with strict geometric or physical constraints (e.g., non-overlap, connectivity), yielding feasible solutions that lie on low-dimensional, small, and sometimes disconnected regions of the output space. This paper studies the failure mode through tangram generation from language, where seven fixed shapes must form a text-described silhouette while remaining connected and non-overlapping, and a simplified rectangle composition task with a learned bounding-box constraint. We find diffusion models struggle to satisfy constraints, consistent with difficulty generating samples near low-dimensional submanifolds. Motivated by locally feasible reparameterizations, we reformulate constrained generation as discrete autoregressive sequential generation. Reinforcement learning improves feasibility and task success, and Monte Carlo tree search quantifies the value of look-ahead when feasible regions shrink. Overall, the empirical, theoretical, and prior-work evidence points to a structural limitation of continuous density matching on this class of constrained-generation problems, and suggests sequential constraint-aware generation as a promising alternative.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that diffusion models exhibit structural limitations when generating samples in constrained planning and design tasks whose feasible sets are low-dimensional, small, or disconnected. It demonstrates this via language-conditioned tangram generation (seven fixed shapes forming a silhouette while satisfying connectivity and non-overlap) and a simplified rectangle composition task with a learned bounding-box constraint, then proposes reformulating the problem as discrete autoregressive sequential generation that is improved by reinforcement learning for feasibility and Monte Carlo tree search for look-ahead value.

Significance. If the central claim holds, the work supplies concrete empirical evidence and geometric motivation for why continuous density matching struggles on this class of problems and identifies sequential constraint-aware generation as a viable alternative. The use of RL and MCTS to improve feasibility on the proposed tasks is a concrete, reproducible strength that could inform hybrid methods in molecular generation, robotics, and inverse design.

major comments (2)

[§4] §4 (Tangram experiments): the claim that observed constraint violations demonstrate a structural limitation of continuous density matching would be strengthened by an explicit control that varies the dimensionality of the feasible set while holding the diffusion architecture fixed; without it, the results remain consistent with the specific discrete geometry of tangram rather than the paradigm itself.
[§5.2] §5.2 (Rectangle composition): the learned bounding-box constraint is a simplified proxy; the manuscript should report the fraction of samples that satisfy the constraint under standard diffusion guidance versus the sequential method, with error bars across at least three random seeds, to make the quantitative gap load-bearing for the broader conclusion.

minor comments (2)

[§3] The abstract states that diffusion models fail 'even with projection or guidance' but does not cite the specific guidance or projection baselines used in the tangram and rectangle experiments; add these references in §3.
[§6] Notation for the autoregressive factorization (e.g., the decomposition of the joint into sequential conditionals) should be introduced with an equation number in §6 to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the significance of our findings. The comments highlight opportunities to strengthen the evidence for a structural limitation of continuous density matching. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [§4] §4 (Tangram experiments): the claim that observed constraint violations demonstrate a structural limitation of continuous density matching would be strengthened by an explicit control that varies the dimensionality of the feasible set while holding the diffusion architecture fixed; without it, the results remain consistent with the specific discrete geometry of tangram rather than the paradigm itself.

Authors: We agree that an explicit control varying feasible-set dimensionality while fixing the diffusion architecture would isolate the effect more cleanly. The tangram task was selected precisely because its feasible region is low-dimensional (seven rigid shapes subject to connectivity and non-overlap), but we recognize that the discrete geometry could be a confounding factor. In the revision we will add an ablation on the rectangle-composition task in which we vary the number of rectangles (and thus the intrinsic dimensionality of the feasible set) while keeping the identical diffusion backbone, training procedure, and guidance mechanism. This will provide the requested control and allow direct comparison of violation rates as dimensionality changes. revision: yes
Referee: [§5.2] §5.2 (Rectangle composition): the learned bounding-box constraint is a simplified proxy; the manuscript should report the fraction of samples that satisfy the constraint under standard diffusion guidance versus the sequential method, with error bars across at least three random seeds, to make the quantitative gap load-bearing for the broader conclusion.

Authors: We accept this recommendation. The current manuscript reports aggregate success rates but does not include per-seed fractions or error bars for the exact constraint-satisfaction metric. In the revised version we will add a table (or expanded figure caption) in §5.2 that reports the mean fraction of samples satisfying the learned bounding-box constraint for both standard diffusion guidance and the sequential RL+MCTS method, together with standard deviation across at least three independent random seeds. This change will make the quantitative gap statistically more robust and directly address the concern that the bounding-box task is only a simplified proxy. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in empirical observations and geometric arguments

full rationale

The paper motivates its claim of a structural limitation in continuous density matching by citing observed failures of diffusion models on the tangram and rectangle tasks, along with geometric arguments that feasible solutions occupy low-dimensional submanifolds. The shift to sequential autoregressive generation with RL and MCTS is presented as a direct response to these empirical patterns and reparameterization ideas rather than any quantity defined in terms of fitted parameters, self-referential equations, or load-bearing self-citations. No derivation step reduces to its inputs by construction, and the central conclusion retains independent content from the reported experiments and prior-work evidence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that feasible solutions occupy low-dimensional submanifolds, with no free parameters or invented entities introduced in the abstract.

axioms (1)

domain assumption Feasible solutions in constrained generation tasks lie on low-dimensional, small, and sometimes disconnected regions of the output space
Invoked in the abstract to explain diffusion model failures on tasks like tangram and rectangle composition.

pith-pipeline@v0.9.0 · 5528 in / 1284 out tokens · 87410 ms · 2026-05-17T03:40:48.393355+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

diffusion models often fail in constrained planning and design tasks, exhibiting severe constraint violations... feasible solutions that lie on low-dimensional, small, and sometimes disconnected regions
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reformulate constrained generation as discrete autoregressive sequential generation. Reinforcement learning improves feasibility

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

Shape related constraints aware gen- eration of mechanical designs through deep convolutional gan.arXiv preprint arXiv:2010.11833, 2020

Waad Almasri, Dimitri Bettebghor, Fakhreddine Ababsa, and Florence Danglade. Shape related constraints aware gen- eration of mechanical designs through deep convolutional gan.arXiv preprint arXiv:2010.11833, 2020. 2

work page arXiv 2010
[2]

Lexical entrainment without conceptual pacts? revisiting the matching task.Journal of Memory and Language, 114: 104129, 2020

Adrian Bangerter, Eric Mayor, and Dominique Knutsen. Lexical entrainment without conceptual pacts? revisiting the matching task.Journal of Memory and Language, 114: 104129, 2020. 2

work page 2020
[3]

Recognition-by-components: A theory of human image understanding.Psychological Review, 94(2): 115–147, 1987

Irving Biederman. Recognition-by-components: A theory of human image understanding.Psychological Review, 94(2): 115–147, 1987. 1

work page 1987
[4]

Ab- stract concepts: External influences, internal constraints, and methodological issues.Psychological Research, 86(8): 2370–2388, 2022

Anna M Borghi, Samuel Shaki, and Martin H Fischer. Ab- stract concepts: External influences, internal constraints, and methodological issues.Psychological Research, 86(8): 2370–2388, 2022. 2

work page 2022
[5]

Interac- tion promotes the adaptation of referential conventions to the communicative context.Cognitive science, 43(8):e12780,

Luc ´ıa Castillo, Kenny Smith, and Holly P Branigan. Interac- tion promotes the adaptation of referential conventions to the communicative context.Cognitive science, 43(8):e12780,

work page
[6]

Decision transformer: Reinforce- ment learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srini- vas, and Igor Mordatch. Decision transformer: Reinforce- ment learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021. 13

work page 2021
[7]

Cops-ref: A new dataset and task on composi- tional referring expression comprehension

Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K Wong, and Qi Wu. Cops-ref: A new dataset and task on composi- tional referring expression comprehension. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10086–10095, 2020. 2

work page 2020
[8]

Constrained synthesis with projected diffusion models.Ad- vances in Neural Information Processing Systems, 37: 89307–89333, 2024

Jacob K Christopher, Stephen Baek, and Nando Fioretto. Constrained synthesis with projected diffusion models.Ad- vances in Neural Information Processing Systems, 37: 89307–89333, 2024. 2

work page 2024
[9]

Referring as a collaborative process.Cognition, 22(1):1–39, 1986

Herbert H Clark and Deanna Wilkes-Gibbs. Referring as a collaborative process.Cognition, 22(1):1–39, 1986. 2

work page 1986
[10]

Visual blending for concept representation: A case study on emoji generation.New Generation Comput- ing, 38(4):739–771, 2020

Jo ˜ao M Cunha, Nuno Lourenc ¸o, Pedro Martins, and Pe- nousal Machado. Visual blending for concept representation: A case study on emoji generation.New Generation Comput- ing, 38(4):739–771, 2020. 2

work page 2020
[11]

Policy improvement by planning with gumbel

Ivo Danihelka, Arthur Guez, Julian Schrittwieser, and David Silver. Policy improvement by planning with gumbel. InIn- ternational Conference on Learning Representations, 2022. 12

work page 2022
[12]

Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 5

work page 2021
[13]

Development of shared information in communication despite hippocampal amnesia.Nature neuroscience, 9(1): 140–146, 2006

Melissa C Duff, Julie Hengst, Daniel Tranel, and Neal J Co- hen. Development of shared information in communication despite hippocampal amnesia.Nature neuroscience, 9(1): 140–146, 2006. 2

work page 2006
[14]

Con- strained layout generation with factor graphs

Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng, Guoji Fu, Yong Liang Goh, Wei Lu, and Wee Sun Lee. Con- strained layout generation with factor graphs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 12851–12860, 2024. 2

work page 2024
[15]

Clipdraw: Exploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022

Kevin Frans, Lisa Soros, and Olaf Witkowski. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022. 2

work page 2022
[16]

Garey and David S

Michael R. Garey and David S. Johnson.Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco, CA, 1979. 1

work page 1979
[17]

Visual conceptual blending with large-scale language and vision models.arXiv preprint arXiv:2106.14127, 2021

Songwei Ge and Devi Parikh. Visual conceptual blending with large-scale language and vision models.arXiv preprint arXiv:2106.14127, 2021. 2

work page arXiv 2021
[18]

Statistical theory of extreme valuse and some practical applications.Nat

Emil Julius Gumbel. Statistical theory of extreme valuse and some practical applications.Nat. Bur. Standards Appl. Math. Ser. 33, 1954. 12

work page 1954
[19]

Denoising dif- fusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InAdvances in Neural Infor- mation Processing Systems (NeurIPS 33), pages 6840–6851,

work page
[20]

Co-speech gesture mimicry in the process of collaborative referring during face-to-face dialogue.Journal of nonverbal behavior, 35:133–153, 2011

Judith Holler and Katie Wilkin. Co-speech gesture mimicry in the process of collaborative referring during face-to-face dialogue.Journal of nonverbal behavior, 35:133–153, 2011. 2

work page 2011
[21]

Speakers’ expe- riences and audience design: Knowing when and knowing how to adjust utterances to addressees.Journal of Memory and Language, 47(4):589–606, 2002

William S Horton and Richard J Gerrig. Speakers’ expe- riences and audience design: Knowing when and knowing how to adjust utterances to addressees.Journal of Memory and Language, 47(4):589–606, 2002. 2

work page 2002
[22]

de Lima, Silvano Martello, Fl´avio K

Manuel Iori, Vin ´ıcius L. de Lima, Silvano Martello, Fl´avio K. Miyazawa, and Michele Monaci. Exact solution techniques for two-dimensional cutting and packing.Eu- ropean Journal of Operational Research, 289(2):399–415,

work page
[23]

Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1911–1920, 2023. 2

work page 1911
[24]

Categorical Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax.arXiv preprint arXiv:1611.01144, 2016. 12

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

Abstract visual reasoning with tangram shapes

Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen V ong, Robert Hawkins, and Yoav Artzi. Abstract visual reasoning with tangram shapes. InProceedings of the 2022 Conference on Empirical Methods in Natural Lan- guage Processing, pages 582–601, 2022. 2

work page 2022
[26]

Scaling up visual and vision-language representa- tion learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,

work page
[27]

Almost op- timal exploration in multi-armed bandits

Zohar Karnin, Tomer Koren, and Oren Somekh. Almost op- timal exploration in multi-armed bandits. InInternational 9 conference on machine learning, pages 1238–1246. PMLR,

work page
[28]

Vilt: Vision- and-language transformer without convolution or region su- pervision

Wonjae Kim, Bokyung Son, and Ildoo Kim. Vilt: Vision- and-language transformer without convolution or region su- pervision. InInternational conference on machine learning, pages 5583–5594. PMLR, 2021. 2

work page 2021
[29]

Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement

Wouter Kool, Herke Van Hoof, and Max Welling. Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. InInternational Conference on Machine Learning, pages 3499–3508. PMLR,

work page
[30]

Changes in ref- erence phrases as a function of frequency of usage in social interaction: A preliminary study.Psychonomic Science, 1: 113–114, 1964

Robert M Krauss and Sidney Weinheimer. Changes in ref- erence phrases as a function of frequency of usage in social interaction: A preliminary study.Psychonomic Science, 1: 113–114, 1964. 2

work page 1964
[31]

Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):1–15, 2020

Tzu-Mao Li, Michal Luk ´aˇc, Micha ¨el Gharbi, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):1–15, 2020. 2

work page 2020
[32]

Text-to-image genera- tion for abstract concepts

Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, and Dongmei Zhang. Text-to-image genera- tion for abstract concepts. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 3360–3368, 2024. 2

work page 2024
[33]

Mirror diffusion models for constrained and wa- termarked generation

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou, and Molei Tao. Mirror diffusion models for constrained and wa- termarked generation. InThirty-seventh Conference on Neu- ral Information Processing Systems, 2023. 2

work page 2023
[34]

Opal: Multimodal image generation for news illustration

Vivian Liu, Han Qiao, and Lydia Chilton. Opal: Multimodal image generation for news illustration. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1–17, 2022. 2

work page 2022
[35]

Object- centric learning with slot attention

Francesco Locatello, Dirk Weissenborn, Thomas Un- terthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. Object- centric learning with slot attention. InAdvances in Neural In- formation Processing Systems (NeurIPS 33), pages 11525– 11538, 2020. 2

work page 2020
[36]

Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Advances in neural information processing systems, 32, 2019

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Advances in neural information processing systems, 32, 2019. 2

work page 2019
[37]

Towards layer- wise image vectorization

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer- wise image vectorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16314–16323, 2022. 2

work page 2022
[38]

Guglielmo Padula, Francesco Romor, Giovanni Stabile, and Gianluigi Rozza. Generative models for the deformation of industrial shapes with linear geometric constraints: Model order and parameter space reductions.Computer Methods in Applied Mechanics and Engineering, 423:116823, 2024. 2

work page 2024
[39]

Sketchgen: Generating constrained cad sketches.Advances in Neural Information Processing Systems, 34:5077–5088, 2021

Wamiq Para, Shariq Bhat, Paul Guerrero, Tom Kelly, Niloy Mitra, Leonidas J Guibas, and Peter Wonka. Sketchgen: Generating constrained cad sketches.Advances in Neural Information Processing Systems, 34:5077–5088, 2021. 2

work page 2021
[40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning (ICML) 2021, pages 8748–8763, 2021. 2

work page 2021
[41]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 2, 13

work page 2021
[42]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, page †, 2022. † Fill in actual page numbers. 1

work page 2022
[43]

Styleclipdraw: Coupling content and style in text-to-drawing translation.arXiv preprint arXiv:2202.12362, 2022

Peter Schaldenbrand, Zhixuan Liu, and Jean Oh. Styleclip- draw: Coupling content and style in text-to-drawing transla- tion.arXiv preprint arXiv:2202.12362, 2022. 2

work page arXiv 2022
[44]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. 13

work page 2020
[45]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

Mastering the game of go without human knowledge.nature, 550(7676): 354–359, 2017

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lu- cas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge.nature, 550(7676): 354–359, 2017. 12

work page 2017
[47]

A general rein- forcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Lau- rent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lill- icrap, Karen Simonyan, and Demis Hassabis. A general rein- forcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018. 2

work page 2018
[48]

Creative blends of visual concepts.arXiv preprint arXiv:2502.16062,

Zhida Sun, Zhenyao Zhang, Yue Zhang, Min Lu, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. Creative blends of visual concepts.arXiv preprint arXiv:2502.16062,

work page arXiv
[49]

Mcts based on simple regret

David Tolpin and Solomon Shimony. Mcts based on simple regret. InProceedings of the AAAI Conference on Artificial Intelligence, pages 570–576, 2012. 12

work page 2012
[50]

Untersuchungen zur lehre von der gestalt

Max Wertheimer. Untersuchungen zur lehre von der gestalt. ii.Psychologische Forschung, 4:301–350, 1923. In German. 1

work page 1923
[51]

Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics (TOG), 42(6): 1–14, 2023

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics (TOG), 42(6): 1–14, 2023. 2

work page 2023
[52]

Vismantic: Meaning- making with images

Ping Xiao and Simo Matias Linkola. Vismantic: Meaning- making with images. InInternational Conference on Com- putational Creativity, pages 158–165. Brigham Young Uni- versity, 2015. 2 10

work page 2015
[53]

Diffsketcher: Text guided vector sketch synthesis through latent diffusion models.Advances in Neu- ral Information Processing Systems, 36:15869–15889, 2023

Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models.Advances in Neu- ral Information Processing Systems, 36:15869–15889, 2023. 2

work page 2023
[54]

John I Yellott Jr. The relationship between luce’s choice ax- iom, thurstone’s theory of comparative judgment, and the double exponential distribution.Journal of Mathematical Psychology, 15(2):109–144, 1977. 12

work page 1977
[55]

Monte carlo tree diffusion for system 2 planning.arXiv preprint arXiv:2502.07202, 2025

Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, and Sungjin Ahn. Monte carlo tree diffusion for system 2 planning.arXiv preprint arXiv:2502.07202, 2025. 2

work page arXiv 2025
[56]

arXiv preprint arXiv:2502.05625 , year =

Stefano Zampini, Jacob Christopher, Luca Oneto, Davide Anguita, and Ferdinando Fioretto. Training-free constrained generation with stable diffusion models.arXiv preprint arXiv:2502.05625, 2025. 2

work page arXiv 2025
[57]

Enforcing impre- cise constraints on generative adversarial networks for emu- lating physical systems.Communications in Computational Physics, 30(3):635–665, 2021

Yang Zeng, Jin-Long Wu, and Heng Xiao. Enforcing impre- cise constraints on generative adversarial networks for emu- lating physical systems.Communications in Computational Physics, 30(3):635–665, 2021. 2 11 A. Preliminary: Gumbel Muzero Gumbel Muzero [11] is a variant of the AlphaZero [46] algorithm with a major modification in the MCTS algo- rithm. Inst...

work page 2021
[58]

Which image better matches the description below?

is a bandit algorithm for simple regret minimisation. It aims at maximising theQ-value of the last or the finally selected action in simulations, which differs from the cu- mulative regrets that maximise theQfrom allnsimulations [49]. Algorithm 2Sequential Halving with Gumbel 1:procedureSEQHALGUMBEL(s, N, K, g, π θ, f, σ)▷ s: current state;N: simulation b...

work page 2048

[1] [1]

Shape related constraints aware gen- eration of mechanical designs through deep convolutional gan.arXiv preprint arXiv:2010.11833, 2020

Waad Almasri, Dimitri Bettebghor, Fakhreddine Ababsa, and Florence Danglade. Shape related constraints aware gen- eration of mechanical designs through deep convolutional gan.arXiv preprint arXiv:2010.11833, 2020. 2

work page arXiv 2010

[2] [2]

Lexical entrainment without conceptual pacts? revisiting the matching task.Journal of Memory and Language, 114: 104129, 2020

Adrian Bangerter, Eric Mayor, and Dominique Knutsen. Lexical entrainment without conceptual pacts? revisiting the matching task.Journal of Memory and Language, 114: 104129, 2020. 2

work page 2020

[3] [3]

Recognition-by-components: A theory of human image understanding.Psychological Review, 94(2): 115–147, 1987

Irving Biederman. Recognition-by-components: A theory of human image understanding.Psychological Review, 94(2): 115–147, 1987. 1

work page 1987

[4] [4]

Ab- stract concepts: External influences, internal constraints, and methodological issues.Psychological Research, 86(8): 2370–2388, 2022

Anna M Borghi, Samuel Shaki, and Martin H Fischer. Ab- stract concepts: External influences, internal constraints, and methodological issues.Psychological Research, 86(8): 2370–2388, 2022. 2

work page 2022

[5] [5]

Interac- tion promotes the adaptation of referential conventions to the communicative context.Cognitive science, 43(8):e12780,

Luc ´ıa Castillo, Kenny Smith, and Holly P Branigan. Interac- tion promotes the adaptation of referential conventions to the communicative context.Cognitive science, 43(8):e12780,

work page

[6] [6]

Decision transformer: Reinforce- ment learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srini- vas, and Igor Mordatch. Decision transformer: Reinforce- ment learning via sequence modeling.Advances in neural information processing systems, 34:15084–15097, 2021. 13

work page 2021

[7] [7]

Cops-ref: A new dataset and task on composi- tional referring expression comprehension

Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K Wong, and Qi Wu. Cops-ref: A new dataset and task on composi- tional referring expression comprehension. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10086–10095, 2020. 2

work page 2020

[8] [8]

Constrained synthesis with projected diffusion models.Ad- vances in Neural Information Processing Systems, 37: 89307–89333, 2024

Jacob K Christopher, Stephen Baek, and Nando Fioretto. Constrained synthesis with projected diffusion models.Ad- vances in Neural Information Processing Systems, 37: 89307–89333, 2024. 2

work page 2024

[9] [9]

Referring as a collaborative process.Cognition, 22(1):1–39, 1986

Herbert H Clark and Deanna Wilkes-Gibbs. Referring as a collaborative process.Cognition, 22(1):1–39, 1986. 2

work page 1986

[10] [10]

Visual blending for concept representation: A case study on emoji generation.New Generation Comput- ing, 38(4):739–771, 2020

Jo ˜ao M Cunha, Nuno Lourenc ¸o, Pedro Martins, and Pe- nousal Machado. Visual blending for concept representation: A case study on emoji generation.New Generation Comput- ing, 38(4):739–771, 2020. 2

work page 2020

[11] [11]

Policy improvement by planning with gumbel

Ivo Danihelka, Arthur Guez, Julian Schrittwieser, and David Silver. Policy improvement by planning with gumbel. InIn- ternational Conference on Learning Representations, 2022. 12

work page 2022

[12] [12]

Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 5

work page 2021

[13] [13]

Development of shared information in communication despite hippocampal amnesia.Nature neuroscience, 9(1): 140–146, 2006

Melissa C Duff, Julie Hengst, Daniel Tranel, and Neal J Co- hen. Development of shared information in communication despite hippocampal amnesia.Nature neuroscience, 9(1): 140–146, 2006. 2

work page 2006

[14] [14]

Con- strained layout generation with factor graphs

Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng, Guoji Fu, Yong Liang Goh, Wei Lu, and Wee Sun Lee. Con- strained layout generation with factor graphs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 12851–12860, 2024. 2

work page 2024

[15] [15]

Clipdraw: Exploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022

Kevin Frans, Lisa Soros, and Olaf Witkowski. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022. 2

work page 2022

[16] [16]

Garey and David S

Michael R. Garey and David S. Johnson.Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., San Francisco, CA, 1979. 1

work page 1979

[17] [17]

Visual conceptual blending with large-scale language and vision models.arXiv preprint arXiv:2106.14127, 2021

Songwei Ge and Devi Parikh. Visual conceptual blending with large-scale language and vision models.arXiv preprint arXiv:2106.14127, 2021. 2

work page arXiv 2021

[18] [18]

Statistical theory of extreme valuse and some practical applications.Nat

Emil Julius Gumbel. Statistical theory of extreme valuse and some practical applications.Nat. Bur. Standards Appl. Math. Ser. 33, 1954. 12

work page 1954

[19] [19]

Denoising dif- fusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InAdvances in Neural Infor- mation Processing Systems (NeurIPS 33), pages 6840–6851,

work page

[20] [20]

Co-speech gesture mimicry in the process of collaborative referring during face-to-face dialogue.Journal of nonverbal behavior, 35:133–153, 2011

Judith Holler and Katie Wilkin. Co-speech gesture mimicry in the process of collaborative referring during face-to-face dialogue.Journal of nonverbal behavior, 35:133–153, 2011. 2

work page 2011

[21] [21]

Speakers’ expe- riences and audience design: Knowing when and knowing how to adjust utterances to addressees.Journal of Memory and Language, 47(4):589–606, 2002

William S Horton and Richard J Gerrig. Speakers’ expe- riences and audience design: Knowing when and knowing how to adjust utterances to addressees.Journal of Memory and Language, 47(4):589–606, 2002. 2

work page 2002

[22] [22]

de Lima, Silvano Martello, Fl´avio K

Manuel Iori, Vin ´ıcius L. de Lima, Silvano Martello, Fl´avio K. Miyazawa, and Michele Monaci. Exact solution techniques for two-dimensional cutting and packing.Eu- ropean Journal of Operational Research, 289(2):399–415,

work page

[23] [23]

Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 1911–1920, 2023. 2

work page 1911

[24] [24]

Categorical Reparameterization with Gumbel-Softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax.arXiv preprint arXiv:1611.01144, 2016. 12

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

Abstract visual reasoning with tangram shapes

Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen V ong, Robert Hawkins, and Yoav Artzi. Abstract visual reasoning with tangram shapes. InProceedings of the 2022 Conference on Empirical Methods in Natural Lan- guage Processing, pages 582–601, 2022. 2

work page 2022

[26] [26]

Scaling up visual and vision-language representa- tion learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representa- tion learning with noisy text supervision. InInternational conference on machine learning, pages 4904–4916. PMLR,

work page

[27] [27]

Almost op- timal exploration in multi-armed bandits

Zohar Karnin, Tomer Koren, and Oren Somekh. Almost op- timal exploration in multi-armed bandits. InInternational 9 conference on machine learning, pages 1238–1246. PMLR,

work page

[28] [28]

Vilt: Vision- and-language transformer without convolution or region su- pervision

Wonjae Kim, Bokyung Son, and Ildoo Kim. Vilt: Vision- and-language transformer without convolution or region su- pervision. InInternational conference on machine learning, pages 5583–5594. PMLR, 2021. 2

work page 2021

[29] [29]

Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement

Wouter Kool, Herke Van Hoof, and Max Welling. Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. InInternational Conference on Machine Learning, pages 3499–3508. PMLR,

work page

[30] [30]

Changes in ref- erence phrases as a function of frequency of usage in social interaction: A preliminary study.Psychonomic Science, 1: 113–114, 1964

Robert M Krauss and Sidney Weinheimer. Changes in ref- erence phrases as a function of frequency of usage in social interaction: A preliminary study.Psychonomic Science, 1: 113–114, 1964. 2

work page 1964

[31] [31]

Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):1–15, 2020

Tzu-Mao Li, Michal Luk ´aˇc, Micha ¨el Gharbi, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):1–15, 2020. 2

work page 2020

[32] [32]

Text-to-image genera- tion for abstract concepts

Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, and Dongmei Zhang. Text-to-image genera- tion for abstract concepts. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 3360–3368, 2024. 2

work page 2024

[33] [33]

Mirror diffusion models for constrained and wa- termarked generation

Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou, and Molei Tao. Mirror diffusion models for constrained and wa- termarked generation. InThirty-seventh Conference on Neu- ral Information Processing Systems, 2023. 2

work page 2023

[34] [34]

Opal: Multimodal image generation for news illustration

Vivian Liu, Han Qiao, and Lydia Chilton. Opal: Multimodal image generation for news illustration. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1–17, 2022. 2

work page 2022

[35] [35]

Object- centric learning with slot attention

Francesco Locatello, Dirk Weissenborn, Thomas Un- terthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. Object- centric learning with slot attention. InAdvances in Neural In- formation Processing Systems (NeurIPS 33), pages 11525– 11538, 2020. 2

work page 2020

[36] [36]

Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Advances in neural information processing systems, 32, 2019

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Advances in neural information processing systems, 32, 2019. 2

work page 2019

[37] [37]

Towards layer- wise image vectorization

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer- wise image vectorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16314–16323, 2022. 2

work page 2022

[38] [38]

Guglielmo Padula, Francesco Romor, Giovanni Stabile, and Gianluigi Rozza. Generative models for the deformation of industrial shapes with linear geometric constraints: Model order and parameter space reductions.Computer Methods in Applied Mechanics and Engineering, 423:116823, 2024. 2

work page 2024

[39] [39]

Sketchgen: Generating constrained cad sketches.Advances in Neural Information Processing Systems, 34:5077–5088, 2021

Wamiq Para, Shariq Bhat, Paul Guerrero, Tom Kelly, Niloy Mitra, Leonidas J Guibas, and Peter Wonka. Sketchgen: Generating constrained cad sketches.Advances in Neural Information Processing Systems, 34:5077–5088, 2021. 2

work page 2021

[40] [40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning (ICML) 2021, pages 8748–8763, 2021. 2

work page 2021

[41] [41]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 2, 13

work page 2021

[42] [42]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, page †, 2022. † Fill in actual page numbers. 1

work page 2022

[43] [43]

Styleclipdraw: Coupling content and style in text-to-drawing translation.arXiv preprint arXiv:2202.12362, 2022

Peter Schaldenbrand, Zhixuan Liu, and Jean Oh. Styleclip- draw: Coupling content and style in text-to-drawing transla- tion.arXiv preprint arXiv:2202.12362, 2022. 2

work page arXiv 2022

[44] [44]

Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020. 13

work page 2020

[45] [45]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[46] [46]

Mastering the game of go without human knowledge.nature, 550(7676): 354–359, 2017

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lu- cas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge.nature, 550(7676): 354–359, 2017. 12

work page 2017

[47] [47]

A general rein- forcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Lau- rent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lill- icrap, Karen Simonyan, and Demis Hassabis. A general rein- forcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018. 2

work page 2018

[48] [48]

Creative blends of visual concepts.arXiv preprint arXiv:2502.16062,

Zhida Sun, Zhenyao Zhang, Yue Zhang, Min Lu, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. Creative blends of visual concepts.arXiv preprint arXiv:2502.16062,

work page arXiv

[49] [49]

Mcts based on simple regret

David Tolpin and Solomon Shimony. Mcts based on simple regret. InProceedings of the AAAI Conference on Artificial Intelligence, pages 570–576, 2012. 12

work page 2012

[50] [50]

Untersuchungen zur lehre von der gestalt

Max Wertheimer. Untersuchungen zur lehre von der gestalt. ii.Psychologische Forschung, 4:301–350, 1923. In German. 1

work page 1923

[51] [51]

Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics (TOG), 42(6): 1–14, 2023

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics (TOG), 42(6): 1–14, 2023. 2

work page 2023

[52] [52]

Vismantic: Meaning- making with images

Ping Xiao and Simo Matias Linkola. Vismantic: Meaning- making with images. InInternational Conference on Com- putational Creativity, pages 158–165. Brigham Young Uni- versity, 2015. 2 10

work page 2015

[53] [53]

Diffsketcher: Text guided vector sketch synthesis through latent diffusion models.Advances in Neu- ral Information Processing Systems, 36:15869–15889, 2023

Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models.Advances in Neu- ral Information Processing Systems, 36:15869–15889, 2023. 2

work page 2023

[54] [54]

John I Yellott Jr. The relationship between luce’s choice ax- iom, thurstone’s theory of comparative judgment, and the double exponential distribution.Journal of Mathematical Psychology, 15(2):109–144, 1977. 12

work page 1977

[55] [55]

Monte carlo tree diffusion for system 2 planning.arXiv preprint arXiv:2502.07202, 2025

Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, and Sungjin Ahn. Monte carlo tree diffusion for system 2 planning.arXiv preprint arXiv:2502.07202, 2025. 2

work page arXiv 2025

[56] [56]

arXiv preprint arXiv:2502.05625 , year =

Stefano Zampini, Jacob Christopher, Luca Oneto, Davide Anguita, and Ferdinando Fioretto. Training-free constrained generation with stable diffusion models.arXiv preprint arXiv:2502.05625, 2025. 2

work page arXiv 2025

[57] [57]

Enforcing impre- cise constraints on generative adversarial networks for emu- lating physical systems.Communications in Computational Physics, 30(3):635–665, 2021

Yang Zeng, Jin-Long Wu, and Heng Xiao. Enforcing impre- cise constraints on generative adversarial networks for emu- lating physical systems.Communications in Computational Physics, 30(3):635–665, 2021. 2 11 A. Preliminary: Gumbel Muzero Gumbel Muzero [11] is a variant of the AlphaZero [46] algorithm with a major modification in the MCTS algo- rithm. Inst...

work page 2021

[58] [58]

Which image better matches the description below?

is a bandit algorithm for simple regret minimisation. It aims at maximising theQ-value of the last or the finally selected action in simulations, which differs from the cu- mulative regrets that maximise theQfrom allnsimulations [49]. Algorithm 2Sequential Halving with Gumbel 1:procedureSEQHALGUMBEL(s, N, K, g, π θ, f, σ)▷ s: current state;N: simulation b...

work page 2048