MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

arxiv: 2509.22151 · v3 · pith:CWRFAWGBnew · submitted 2025-09-26 · 💻 cs.CV · cs.CL

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

Jonas Belouadi , Tamy Boubekeur , Adrien Kaiser This is my paper

Pith reviewed 2026-05-18 13:37 UTC · model grok-4.3

classification 💻 cs.CV cs.CL

keywords procedural materialsmaterial node graphsprogram synthesismultimodal modelscomputer graphicstexture generationgraph synthesislarge multimodal models

0 comments p. Extension

pith:CWRFAWGB Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{CWRFAWGB}

Prints a linked pith:CWRFAWGB badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Multimodal models that process both images and text of node graphs generate procedural material programs more efficiently and with higher visual quality than text-only baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MultiMat, a framework that uses large multimodal models to synthesize programs for procedural materials by handling both visual depictions and textual descriptions of node graphs. These graphs are directed acyclic structures that produce 2D channels like roughness, displacement, albedo, and conductivity maps for 3D object appearance. Earlier neural methods represented the same graphs only as text, which overlooks the visual-spatial relationships that make the graphs easy for humans to understand and edit. The authors train on a new dataset of production-quality materials and add a constrained tree search step to keep generated programs statically correct. If the central claim holds, this multimodal route would make creating parametric, high-resolution materials faster and more accurate in both free-form and conditioned settings.

Core claim

We present MultiMat, a multimodal program synthesis framework that leverages large multimodal models to process both visual and textual graph representations for improved generation of procedural material graphs. We train our models on a new dataset of production-quality procedural materials and combine them with a constrained tree search inference algorithm that ensures static correctness while efficiently navigating the program space. Our experimental results show that our multimodal program synthesis method is more efficient in both unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.

What carries the argument

The multimodal program synthesis framework that processes visual and textual representations of node graphs with large multimodal models, paired with a constrained tree search inference algorithm to guarantee static correctness.

If this is right

Multimodal synthesis produces material node graphs more efficiently than text-only methods in both unconditional and conditional settings.
The generated graphs achieve measurably higher visual quality and fidelity to target appearances.
The constrained tree search guarantees static correctness of every output program.
The approach establishes new state-of-the-art performance on the task of procedural material graph synthesis.
The method supports modular, interpretable workflows for interactive appearance modeling at arbitrary resolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same visual-plus-textual input strategy might improve synthesis of other graph-structured visual programs such as shader networks or filter graphs.
Embedding the method in artist tools could reduce the professional training currently required to author production materials.
Success on this task implies that visual inputs help models capture spatial adjacency and layout relations that sequential text encodings tend to lose.
Creating similarly comprehensive datasets would be the main bottleneck when extending the technique to new material domains or rendering pipelines.

Load-bearing premise

Jointly processing visual and textual representations captures the visual-spatial structure of node graphs in a way that text-only models cannot, and the new production-quality dataset covers the domain sufficiently without introducing biases that limit generalization.

What would settle it

Retraining an equivalent text-only model on the same dataset and finding it matches or exceeds the multimodal version in synthesis efficiency, visual quality, and fidelity on held-out material graphs would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2509.22151 by Adrien Kaiser, Jonas Belouadi, Tamy Boubekeur.

**Figure 2.** Figure 2: Architecture overview of MultiMat during inference. The system constructs a multimodal program tree T by iteratively generating node definitions. At each step 𝑡, the system derives a graph 𝐺𝑡 of valid nodes along with corresponding intermediate outputs 𝐼𝑡 by traversing T, which may contain both valid and invalid nodes, to generate the next node 𝑣𝑡+1. When transpilation and execution succeed, the system adv… view at source ↗

**Figure 3.** Figure 3: Visualization of the two conditioning approaches used by [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of our inference algorithm as a tree search. Tree nodes represent generated [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results for inverse procedural material modeling. The leftmost column shows [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Representative failure cases from the same challenging subset in Figure [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Example materials generated unconditionally by [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Complete example of a graph in CompactSBS format. This listing shows the full representation of the material partially illustrated in [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Material node graphs are programs that generate the 2D channels of procedural materials, including geometry such as roughness and displacement maps, and reflectance such as albedo and conductivity maps. They are essential in computer graphics for representing the appearance of virtual 3D objects parametrically and at arbitrary resolution. In particular, their directed acyclic graph structure and intermediate states enable a modular, interpretable workflow for interactive appearance modeling. However, creating such graphs remains challenging and typically requires professional training. While recent neural program synthesis approaches attempt to simplify this process, they solely represent graphs as textual programs, failing to capture the inherently visual-spatial nature of node graphs that makes them accessible to humans. To address this gap, we present MultiMat, a multimodal program synthesis framework that leverages large multimodal models to process both visual and textual graph representations for improved generation of procedural material graphs. We train our models on a new dataset of production-quality procedural materials and combine them with a constrained tree search inference algorithm that ensures static correctness while efficiently navigating the program space. Our experimental results show that our multimodal program synthesis method is more efficient in both unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MultiMat's multimodal take on material node graphs plus constrained search is a practical step forward, but the gains need to be checked against matched baselines to confirm it's not just bigger models or new data.

read the letter

The main thing to know is that this paper tries to move neural program synthesis for procedural materials beyond text-only representations by feeding large multimodal models both visual renderings of the node graphs and their text descriptions, then using constrained tree search at inference to keep outputs statically correct. They also release a new dataset of production-quality materials. That combination addresses a real usability gap in graphics, where these graphs are spatial and visual by nature, and the search step is a sensible engineering choice to avoid invalid programs. The abstract reports better efficiency and visual quality than text baselines in both unconditional and conditional settings, which would be useful if it holds up. The work is aimed squarely at people building or using procedural material tools, and the dataset plus search method are concrete contributions that practitioners could build on. The soft spot is the comparison setup. The stress-test note is right to flag that without details on whether the text-only baselines used the same model scale, training volume, or even explicit spatial descriptions in text, it's unclear if the reported edge comes from the multimodal input or from other differences. The abstract itself gives no numbers, ablations, or statistical details, so the full experimental section needs to show fair controls before the multimodal claim lands solidly. If those controls are there and the gains persist, this is worth referee time for the graphics community. Otherwise it risks overstating the novelty of the input format. I'd bring it to a reading group for the dataset and search idea but would wait on citing until the baselines are clearer.

Referee Report

2 major / 2 minor

Summary. The paper introduces MultiMat, a multimodal program synthesis framework that uses large multimodal models to generate procedural material node graphs by jointly processing visual renderings and textual representations. It presents a new production-quality dataset of procedural materials and combines the models with a constrained tree search inference algorithm to ensure static correctness. The central claim is that this multimodal approach yields more efficient unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.

Significance. If the empirical claims hold after controlling for confounds, the work would advance procedural material generation in computer graphics by demonstrating that multimodal inputs can better capture the visual-spatial structure of node graphs than text alone. The new dataset of production-quality materials represents a concrete, reusable contribution that could support future research in this area.

major comments (2)

[Experimental results / baselines] Experimental section (assumed §4 or §5 based on standard structure): the superiority claim over text-only baselines is load-bearing for the central contribution, yet the abstract and available description provide no quantitative metrics, baseline model sizes, training data volumes, or confirmation that text baselines received equivalent spatial layout information. Without these controls, gains cannot be attributed specifically to visual-spatial encoding rather than differences in model capacity or data.
[Method / inference] Inference algorithm description: the constrained tree search is presented as ensuring static correctness and efficiency, but no ablation or comparison is referenced showing its contribution independent of the multimodal model; if the search is identical across multimodal and text-only settings, this must be stated explicitly to support the efficiency claim.

minor comments (2)

[Introduction] Notation for node graphs and channels (e.g., roughness, albedo) should be introduced with a small diagram or table early in the paper for readers unfamiliar with procedural materials.
[Abstract / Introduction] The abstract asserts 'new state-of-the-art performance' without citing the specific prior neural program synthesis works being surpassed; add explicit citations in the introduction and results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We respond to each major comment in turn and indicate the changes made to the manuscript.

read point-by-point responses

Referee: [Experimental results / baselines] Experimental section (assumed §4 or §5 based on standard structure): the superiority claim over text-only baselines is load-bearing for the central contribution, yet the abstract and available description provide no quantitative metrics, baseline model sizes, training data volumes, or confirmation that text baselines received equivalent spatial layout information. Without these controls, gains cannot be attributed specifically to visual-spatial encoding rather than differences in model capacity or data.

Authors: We agree that clearer documentation of controls is necessary to attribute improvements specifically to the multimodal input. The full manuscript reports quantitative results in Section 5, including visual quality metrics (e.g., rendered image fidelity and perceptual scores) and efficiency metrics (e.g., graph size and generation time). Both the multimodal and text-only baselines employ the same underlying model architecture and are trained on identical data volumes; the sole distinction is the input modality. The text-only baseline receives only the textual program representation and no additional spatial layout information. We have revised Section 4.1 to add an explicit description of the baseline configurations, model sizes, and training data volumes, along with a statement confirming that text baselines received no extra spatial cues. revision: yes
Referee: [Method / inference] Inference algorithm description: the constrained tree search is presented as ensuring static correctness and efficiency, but no ablation or comparison is referenced showing its contribution independent of the multimodal model; if the search is identical across multimodal and text-only settings, this must be stated explicitly to support the efficiency claim.

Authors: The referee is correct that an explicit statement and supporting ablation would strengthen the presentation. The constrained tree search is applied identically in both the multimodal and text-only settings to enforce static correctness during inference. We have added a new ablation subsection in the revised Section 5.3 that isolates the search's contribution by comparing results with and without it across both model types. We have also inserted an explicit statement in Section 3.3 that the inference algorithm is identical for all compared methods. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical comparisons to external baselines

full rationale

The paper introduces MultiMat as a multimodal program synthesis method trained on a new production-quality dataset of procedural material graphs, then evaluates it via constrained tree search against text-only baselines on metrics of efficiency, visual quality, and fidelity. No equations, derivations, or self-referential definitions appear in the provided text; performance claims are presented as outcomes of training and inference rather than reductions to fitted parameters or prior self-citations. The central argument relies on observable experimental differences, which remain externally falsifiable and independent of internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the untested premise that multimodal models can meaningfully interpret visual layouts of node graphs and that the constrained search preserves both validity and expressiveness; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Large multimodal models can jointly process visual and textual representations of directed acyclic node graphs to improve synthesis quality over text alone
This is the core modeling choice stated in the motivation section of the abstract.
domain assumption A constrained tree search algorithm can efficiently navigate the program space while guaranteeing static correctness
Invoked to ensure generated graphs remain valid without further elaboration in the abstract.

pith-pipeline@v0.9.0 · 5749 in / 1330 out tokens · 47278 ms · 2026-05-18T13:37:54.704833+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 5 internal anchors

[1]

Anthropic

doi: 10.1109/FMCAD.2013.6679385. Anthropic. System card: Claude Opus 4 & Claude Sonnet 4,

work page doi:10.1109/fmcad.2013.6679385 2013
[2]

Qwen2.5-VL Technical Report

URLhttps://arxiv.org/abs/2502.13923. Jonas Belouadi, Anne Lauscher, and Steffen Eger. AutomaTikZ: Text-guided synthesis of scientific vector graphics with TikZ. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, May 2024a. URLhttps://openreview.net/forum?id=v3K5TVP8kZ. Jonas Belouadi, Simone Paolo Ponzetto, and Steffen Eg...

work page internal anchor Pith review Pith/arXiv arXiv
[3]

URLhttps://doi.org/10.1007/ s10270-023-01105-5

doi: 10.1007/S10270-023-01105-5. URLhttps://doi.org/10.1007/ s10270-023-01105-5. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott ...

work page doi:10.1007/s10270-023-01105-5
[4]

Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B

URLhttps://proceedings.neurips.cc/paper_ files/paper/2019/file/50d2d2262762648589b1943078712aa6-Paper.pdf. Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning. InProceedings ...

work page doi:10.1145/3453483.3454080 2019
[5]

doi: 10.1145/3528223.3530173

ISSN0730-0301. doi: 10.1145/3528223.3530173. URLhttps://doi.org/10.1145/3528223. 3530173. Abhimanyu Hans, John Kirchenbauer, Yuxin Wen, Neel Jain, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, and Tom Goldstein. Be like a goldfish, don’t memorize! mitigating memorization in generative LLMs. InThe Thi...

work page doi:10.1145/3528223.3530173
[6]

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi

URLhttps://openreview.net/forum? id=SJgdnAVKDH. Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.),Proceedings of the 2021 Conference on Empirical Methods in Natural Language Process...

work page 2021
[7]

doi: 10.18653/v1/2021.ﬁndings-emnlp.424

Association for Computational Linguistics. doi: 10.18653/v1/2021. emnlp-main.595. URLhttps://aclanthology.org/2021.emnlp-main.595. Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. Node graph optimization using differentiable proxies. InACM SIGGRAPH 2022 Conference Proceedings, SIGGRAPH ’22, New York, NY, USA,

work page doi:10.18653/v1/2021 2021
[8]

Time-multiplexed

Association for Computing Machinery. ISBN 9781450393379. doi: 10.1145/3528233.3530733. URLhttps://doi.org/10.1145/3528233. 3530733. Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. Generating proceduralmaterialsfromtextorimageprompts.InACMSIGGRAPH2023ConferenceProceedings, SIGGRAPH ’23, New York, NY, USA,

work page doi:10.1145/3528233.3530733
[9]

ISBN 9798400701597

Association for Computing Machinery. ISBN 9798400701597. doi: 10.1145/3588432.3591520. URLhttps://doi.org/10.1145/3588432. 3591520. Nam Huynh and Beiyu Lin. Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications,

work page doi:10.1145/3588432.3591520
[10]

Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications,

URLhttps://arxiv.org/abs/ 2503.01245. Shreyas Kapur, Erik Jenner, and Stuart Russell. Diffusion on syntax trees for program synthesis. InThe Thirteenth International Conference on Learning Representations,

work page arXiv
[11]

Shengzhi Li and Nima Tajbakhsh

URLhttps://arxiv.org/abs/ 2408.12637. BeichenLi,LiangShi,andWojciechMatusik. End-to-endproceduralmaterialcapturewithproxy-free mixed-integer optimization.ACM Transactions on Graphics (TOG), 42(4):1–15, 2023a. Beichen Li, Yiwei Hu, Paul Guerrero, Milos Hasan, Liang Shi, Valentin Deschaintre, and Wojciech Matusik. Procedural material generation with reinfor...

work page arXiv
[12]

doi: 10.1145/3687979

ISSN 0730-0301. doi: 10.1145/3687979. URLhttps://doi.org/10.1145/ 3687979. 12 Under review Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, and Wojciech Matusik. VLMaterial: Procedural material generation with large vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025a. URL ...

work page doi:10.1145/3687979
[13]

Wen-Ding Li, Darren Yan Key, and Kevin Ellis

URLhttps: //openreview.net/forum?id=xqc8yyhScL. Wen-Ding Li, Darren Yan Key, and Kevin Ellis. Toward trustworthy neural program synthesis. In ICLR 2025 Workshop on Human-AI Coevolution, 2025b. URLhttps://openreview.net/ forum?id=HPlvbIJGWy. Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Ré mi Leblond, Tom Eccles, James Keeling, ...

work page 2025
[14]

Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Ja´skowski

doi: 10.1126/science.abq1158. URLhttps://doi.org/10. 1126%2Fscience.abq1158. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InText Summarization Branches Out, pp. 74–81, Barcelona, Spain, July

work page doi:10.1126/science.abq1158
[15]

StarCoder 2 and The Stack v2: The Next Generation

URLhttps: //arxiv.org/abs/2402.19173. Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks. InProceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, pp. 1235–1247, New York, NY, USA,

work page internal anchor Pith review Pith/arXiv arXiv 2021
[16]

ISBN 9781450383431

Association for Computing Machinery. ISBN 9781450383431. doi: 10.1145/3448016.3457261. URLhttps://doi.org/10.1145/3448016.3457261. Jock Mackinlay. Automating the design of graphical presentations of relational information.ACM Trans. Graph., 5(2):110–141, April

work page doi:10.1145/3448016.3457261
[17]

doi: 10.1145/22949.22950

ISSN 0730-0301. doi: 10.1145/22949.22950. URL https://doi.org/10.1145/22949.22950. OpenAI, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJOstrow,AkilaWelihinda,AlanHayes,AlecRadford,AleksanderMądry,AlexBaker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, and 400 others. GPT...

work page doi:10.1145/22949.22950
[18]

GPT-4o System Card

URLhttps://arxiv.org/abs/2410.21276. Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

A Survey on Transfer Learning

doi: 10.1109/TKDE.2009.191. Emilio Parisotto, Abdel rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. InInternational Conference on Learning Representations,

work page doi:10.1109/tkde.2009.191 2009
[20]

URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2003.00716.x

doi: https://doi.org/10.1111/j.1467-8659.2003.00716.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2003.00716.x. Matt Pharr, Wenzel Jakob, and Greg Humphreys.Physically Based Rendering: From Theory to Implementation(3rded.). MorganKaufmannPublishersInc.,SanFrancisco,CA,USA,3rdedition, November

work page doi:10.1111/j.1467-8659.2003.00716.x 2003
[21]

URLhttps://arxiv.org/abs/ 2501.03992. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.),Proceedings of the 38th ...

work page arXiv
[22]

ISBN 0897916506

Association for Computing Machinery. ISBN 0897916506. doi: 10.1145/191666.191719. URL https: //doi.org/10.1145/191666.191719. Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aar...

work page doi:10.1145/191666.191719
[23]

CSGNet: Neural shape parser for constructive solid geometry

Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. CSGNet: Neural shape parser for constructive solid geometry. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 5515–5523. Computer Vision Foundation / IEEE Computer Society,

work page 2018
[24]

long tail

doi: 10.1109/CVPR. 2018.00578. URL http://openaccess.thecvf.com/content_cvpr_2018/html/Sharma_ CSGNet_Neural_Shape_CVPR_2018_paper.html. Liang Shi, Beichen Li, Miloš Hašan, Kalyan Sunkavalli, Tamy Boubekeur, Radomir Mech, and Wojciech Matusik. Match: Differentiable material graphs for procedural material capture.ACM Transactions on Graphics (TOG), 39(6):1–15,

work page doi:10.1109/cvpr 2018
[25]

HenrikVoigt,KaiLawonn,andSinaZarrieß

URLhttps://openreview.net/forum?id=Vi8AepAXGy. HenrikVoigt,KaiLawonn,andSinaZarrieß. Plotsmadequickly: Anefficientapproachforgenerating visualizations from natural language queries. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue (eds.),Proceedings of the 2024 Joint International Conference on Comput...

work page 2024
[26]

URL https://aclanthology.org/2024.lrec-main.1119/

ELRA and ICCL. URL https://aclanthology.org/2024.lrec-main.1119/. 14 Under review Colin Wei, Kendrick Shen, Yining Chen, and Tengyu Ma. Theoretical analysis of self-training with deep networks on unlabeled data. InInternational Conference on Learning Representations,

work page 2024
[27]

doi: 10.1145/3618364

ISSN 0730-0301. doi: 10.1145/3618364. URLhttps://doi.org/10.1145/3618364. Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, and Hai Jin. Automated data visualization from natural language via large language models: An exploratory study.Proc. ACM Manag. Data, 2(3), May

work page doi:10.1145/3618364
[28]

Qwen3 Technical Report

doi: 10.1145/3654992. URL https://doi.org/10.1145/3654992. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. Qwen3 technical report, 2025a. URL https://arxiv.org/abs/2505....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3654992
[29]

doi: 10.1093/nsr/nwae403

ISSN 2095-5138. doi: 10.1093/nsr/nwae403. URLhttps://doi.org/10.1093/nsr/nwae403. Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Wang Yongji, and Jian-Guang Lou. Large language models meet NL2Code: A survey. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7443...

work page doi:10.1093/nsr/nwae403 2095
[30]

doi: 10.18653/v1/2023.acl-long.411

Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.411. URLhttps://aclanthology.org/2023.acl-long.411. Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier...

work page doi:10.18653/v1/2023.acl-long.411 2023
[31]

A Survey of Large Language Models

URLhttps://arxiv.org/abs/2303.18223. 15 Under review Input VLMaterial+ (SBS) MultiMat+ (Mixed) MultiMat+ (Graph) Figure 6: Representative failure cases from the same challenging subset in Figure

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Anthropic

doi: 10.1109/FMCAD.2013.6679385. Anthropic. System card: Claude Opus 4 & Claude Sonnet 4,

work page doi:10.1109/fmcad.2013.6679385 2013

[2] [2]

Qwen2.5-VL Technical Report

URLhttps://arxiv.org/abs/2502.13923. Jonas Belouadi, Anne Lauscher, and Steffen Eger. AutomaTikZ: Text-guided synthesis of scientific vector graphics with TikZ. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, May 2024a. URLhttps://openreview.net/forum?id=v3K5TVP8kZ. Jonas Belouadi, Simone Paolo Ponzetto, and Steffen Eg...

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

URLhttps://doi.org/10.1007/ s10270-023-01105-5

doi: 10.1007/S10270-023-01105-5. URLhttps://doi.org/10.1007/ s10270-023-01105-5. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott ...

work page doi:10.1007/s10270-023-01105-5

[4] [4]

Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B

URLhttps://proceedings.neurips.cc/paper_ files/paper/2019/file/50d2d2262762648589b1943078712aa6-Paper.pdf. Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning. InProceedings ...

work page doi:10.1145/3453483.3454080 2019

[5] [5]

doi: 10.1145/3528223.3530173

ISSN0730-0301. doi: 10.1145/3528223.3530173. URLhttps://doi.org/10.1145/3528223. 3530173. Abhimanyu Hans, John Kirchenbauer, Yuxin Wen, Neel Jain, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, and Tom Goldstein. Be like a goldfish, don’t memorize! mitigating memorization in generative LLMs. InThe Thi...

work page doi:10.1145/3528223.3530173

[6] [6]

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi

URLhttps://openreview.net/forum? id=SJgdnAVKDH. Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.),Proceedings of the 2021 Conference on Empirical Methods in Natural Language Process...

work page 2021

[7] [7]

doi: 10.18653/v1/2021.ﬁndings-emnlp.424

Association for Computational Linguistics. doi: 10.18653/v1/2021. emnlp-main.595. URLhttps://aclanthology.org/2021.emnlp-main.595. Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. Node graph optimization using differentiable proxies. InACM SIGGRAPH 2022 Conference Proceedings, SIGGRAPH ’22, New York, NY, USA,

work page doi:10.18653/v1/2021 2021

[8] [8]

Time-multiplexed

Association for Computing Machinery. ISBN 9781450393379. doi: 10.1145/3528233.3530733. URLhttps://doi.org/10.1145/3528233. 3530733. Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. Generating proceduralmaterialsfromtextorimageprompts.InACMSIGGRAPH2023ConferenceProceedings, SIGGRAPH ’23, New York, NY, USA,

work page doi:10.1145/3528233.3530733

[9] [9]

ISBN 9798400701597

Association for Computing Machinery. ISBN 9798400701597. doi: 10.1145/3588432.3591520. URLhttps://doi.org/10.1145/3588432. 3591520. Nam Huynh and Beiyu Lin. Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications,

work page doi:10.1145/3588432.3591520

[10] [10]

Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications,

URLhttps://arxiv.org/abs/ 2503.01245. Shreyas Kapur, Erik Jenner, and Stuart Russell. Diffusion on syntax trees for program synthesis. InThe Thirteenth International Conference on Learning Representations,

work page arXiv

[11] [11]

Shengzhi Li and Nima Tajbakhsh

URLhttps://arxiv.org/abs/ 2408.12637. BeichenLi,LiangShi,andWojciechMatusik. End-to-endproceduralmaterialcapturewithproxy-free mixed-integer optimization.ACM Transactions on Graphics (TOG), 42(4):1–15, 2023a. Beichen Li, Yiwei Hu, Paul Guerrero, Milos Hasan, Liang Shi, Valentin Deschaintre, and Wojciech Matusik. Procedural material generation with reinfor...

work page arXiv

[12] [12]

doi: 10.1145/3687979

ISSN 0730-0301. doi: 10.1145/3687979. URLhttps://doi.org/10.1145/ 3687979. 12 Under review Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, and Wojciech Matusik. VLMaterial: Procedural material generation with large vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025a. URL ...

work page doi:10.1145/3687979

[13] [13]

Wen-Ding Li, Darren Yan Key, and Kevin Ellis

URLhttps: //openreview.net/forum?id=xqc8yyhScL. Wen-Ding Li, Darren Yan Key, and Kevin Ellis. Toward trustworthy neural program synthesis. In ICLR 2025 Workshop on Human-AI Coevolution, 2025b. URLhttps://openreview.net/ forum?id=HPlvbIJGWy. Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Ré mi Leblond, Tom Eccles, James Keeling, ...

work page 2025

[14] [14]

Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Ja´skowski

doi: 10.1126/science.abq1158. URLhttps://doi.org/10. 1126%2Fscience.abq1158. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InText Summarization Branches Out, pp. 74–81, Barcelona, Spain, July

work page doi:10.1126/science.abq1158

[15] [15]

StarCoder 2 and The Stack v2: The Next Generation

URLhttps: //arxiv.org/abs/2402.19173. Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks. InProceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, pp. 1235–1247, New York, NY, USA,

work page internal anchor Pith review Pith/arXiv arXiv 2021

[16] [16]

ISBN 9781450383431

Association for Computing Machinery. ISBN 9781450383431. doi: 10.1145/3448016.3457261. URLhttps://doi.org/10.1145/3448016.3457261. Jock Mackinlay. Automating the design of graphical presentations of relational information.ACM Trans. Graph., 5(2):110–141, April

work page doi:10.1145/3448016.3457261

[17] [17]

doi: 10.1145/22949.22950

ISSN 0730-0301. doi: 10.1145/22949.22950. URL https://doi.org/10.1145/22949.22950. OpenAI, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJOstrow,AkilaWelihinda,AlanHayes,AlecRadford,AleksanderMądry,AlexBaker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, and 400 others. GPT...

work page doi:10.1145/22949.22950

[18] [18]

GPT-4o System Card

URLhttps://arxiv.org/abs/2410.21276. Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

A Survey on Transfer Learning

doi: 10.1109/TKDE.2009.191. Emilio Parisotto, Abdel rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. InInternational Conference on Learning Representations,

work page doi:10.1109/tkde.2009.191 2009

[20] [20]

URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2003.00716.x

doi: https://doi.org/10.1111/j.1467-8659.2003.00716.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2003.00716.x. Matt Pharr, Wenzel Jakob, and Greg Humphreys.Physically Based Rendering: From Theory to Implementation(3rded.). MorganKaufmannPublishersInc.,SanFrancisco,CA,USA,3rdedition, November

work page doi:10.1111/j.1467-8659.2003.00716.x 2003

[21] [21]

URLhttps://arxiv.org/abs/ 2501.03992. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.),Proceedings of the 38th ...

work page arXiv

[22] [22]

ISBN 0897916506

Association for Computing Machinery. ISBN 0897916506. doi: 10.1145/191666.191719. URL https: //doi.org/10.1145/191666.191719. Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aar...

work page doi:10.1145/191666.191719

[23] [23]

CSGNet: Neural shape parser for constructive solid geometry

Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. CSGNet: Neural shape parser for constructive solid geometry. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 5515–5523. Computer Vision Foundation / IEEE Computer Society,

work page 2018

[24] [24]

long tail

doi: 10.1109/CVPR. 2018.00578. URL http://openaccess.thecvf.com/content_cvpr_2018/html/Sharma_ CSGNet_Neural_Shape_CVPR_2018_paper.html. Liang Shi, Beichen Li, Miloš Hašan, Kalyan Sunkavalli, Tamy Boubekeur, Radomir Mech, and Wojciech Matusik. Match: Differentiable material graphs for procedural material capture.ACM Transactions on Graphics (TOG), 39(6):1–15,

work page doi:10.1109/cvpr 2018

[25] [25]

HenrikVoigt,KaiLawonn,andSinaZarrieß

URLhttps://openreview.net/forum?id=Vi8AepAXGy. HenrikVoigt,KaiLawonn,andSinaZarrieß. Plotsmadequickly: Anefficientapproachforgenerating visualizations from natural language queries. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue (eds.),Proceedings of the 2024 Joint International Conference on Comput...

work page 2024

[26] [26]

URL https://aclanthology.org/2024.lrec-main.1119/

ELRA and ICCL. URL https://aclanthology.org/2024.lrec-main.1119/. 14 Under review Colin Wei, Kendrick Shen, Yining Chen, and Tengyu Ma. Theoretical analysis of self-training with deep networks on unlabeled data. InInternational Conference on Learning Representations,

work page 2024

[27] [27]

doi: 10.1145/3618364

ISSN 0730-0301. doi: 10.1145/3618364. URLhttps://doi.org/10.1145/3618364. Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, and Hai Jin. Automated data visualization from natural language via large language models: An exploratory study.Proc. ACM Manag. Data, 2(3), May

work page doi:10.1145/3618364

[28] [28]

Qwen3 Technical Report

doi: 10.1145/3654992. URL https://doi.org/10.1145/3654992. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. Qwen3 technical report, 2025a. URL https://arxiv.org/abs/2505....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3654992

[29] [29]

doi: 10.1093/nsr/nwae403

ISSN 2095-5138. doi: 10.1093/nsr/nwae403. URLhttps://doi.org/10.1093/nsr/nwae403. Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Wang Yongji, and Jian-Guang Lou. Large language models meet NL2Code: A survey. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7443...

work page doi:10.1093/nsr/nwae403 2095

[30] [30]

doi: 10.18653/v1/2023.acl-long.411

Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.411. URLhttps://aclanthology.org/2023.acl-long.411. Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier...

work page doi:10.18653/v1/2023.acl-long.411 2023

[31] [31]

A Survey of Large Language Models

URLhttps://arxiv.org/abs/2303.18223. 15 Under review Input VLMaterial+ (SBS) MultiMat+ (Mixed) MultiMat+ (Graph) Figure 6: Representative failure cases from the same challenging subset in Figure

work page internal anchor Pith review Pith/arXiv arXiv