pith. sign in

arxiv: 2509.22151 · v3 · pith:CWRFAWGBnew · submitted 2025-09-26 · 💻 cs.CV · cs.CL

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models

Pith reviewed 2026-05-18 13:37 UTC · model grok-4.3

classification 💻 cs.CV cs.CL
keywords procedural materialsmaterial node graphsprogram synthesismultimodal modelscomputer graphicstexture generationgraph synthesislarge multimodal models
0
0 comments X p. Extension
pith:CWRFAWGB Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{CWRFAWGB}

Prints a linked pith:CWRFAWGB badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Multimodal models that process both images and text of node graphs generate procedural material programs more efficiently and with higher visual quality than text-only baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MultiMat, a framework that uses large multimodal models to synthesize programs for procedural materials by handling both visual depictions and textual descriptions of node graphs. These graphs are directed acyclic structures that produce 2D channels like roughness, displacement, albedo, and conductivity maps for 3D object appearance. Earlier neural methods represented the same graphs only as text, which overlooks the visual-spatial relationships that make the graphs easy for humans to understand and edit. The authors train on a new dataset of production-quality materials and add a constrained tree search step to keep generated programs statically correct. If the central claim holds, this multimodal route would make creating parametric, high-resolution materials faster and more accurate in both free-form and conditioned settings.

Core claim

We present MultiMat, a multimodal program synthesis framework that leverages large multimodal models to process both visual and textual graph representations for improved generation of procedural material graphs. We train our models on a new dataset of production-quality procedural materials and combine them with a constrained tree search inference algorithm that ensures static correctness while efficiently navigating the program space. Our experimental results show that our multimodal program synthesis method is more efficient in both unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.

What carries the argument

The multimodal program synthesis framework that processes visual and textual representations of node graphs with large multimodal models, paired with a constrained tree search inference algorithm to guarantee static correctness.

If this is right

  • Multimodal synthesis produces material node graphs more efficiently than text-only methods in both unconditional and conditional settings.
  • The generated graphs achieve measurably higher visual quality and fidelity to target appearances.
  • The constrained tree search guarantees static correctness of every output program.
  • The approach establishes new state-of-the-art performance on the task of procedural material graph synthesis.
  • The method supports modular, interpretable workflows for interactive appearance modeling at arbitrary resolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same visual-plus-textual input strategy might improve synthesis of other graph-structured visual programs such as shader networks or filter graphs.
  • Embedding the method in artist tools could reduce the professional training currently required to author production materials.
  • Success on this task implies that visual inputs help models capture spatial adjacency and layout relations that sequential text encodings tend to lose.
  • Creating similarly comprehensive datasets would be the main bottleneck when extending the technique to new material domains or rendering pipelines.

Load-bearing premise

Jointly processing visual and textual representations captures the visual-spatial structure of node graphs in a way that text-only models cannot, and the new production-quality dataset covers the domain sufficiently without introducing biases that limit generalization.

What would settle it

Retraining an equivalent text-only model on the same dataset and finding it matches or exceeds the multimodal version in synthesis efficiency, visual quality, and fidelity on held-out material graphs would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2509.22151 by Adrien Kaiser, Jonas Belouadi, Tamy Boubekeur.

Figure 1
Figure 1. Figure 1: Procedural materials offer powerful control over the appearance of 3D objects through a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture overview of MultiMat during inference. The system constructs a multimodal program tree T by iteratively generating node definitions. At each step 𝑡, the system derives a graph 𝐺𝑡 of valid nodes along with corresponding intermediate outputs 𝐼𝑡 by traversing T, which may contain both valid and invalid nodes, to generate the next node 𝑣𝑡+1. When transpilation and execution succeed, the system adv… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the two conditioning approaches used by [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of our inference algorithm as a tree search. Tree nodes represent generated [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results for inverse procedural material modeling. The leftmost column shows [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative failure cases from the same challenging subset in Figure [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example materials generated unconditionally by [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Complete example of a graph in CompactSBS format. This listing shows the full representation of the material partially illustrated in [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
read the original abstract

Material node graphs are programs that generate the 2D channels of procedural materials, including geometry such as roughness and displacement maps, and reflectance such as albedo and conductivity maps. They are essential in computer graphics for representing the appearance of virtual 3D objects parametrically and at arbitrary resolution. In particular, their directed acyclic graph structure and intermediate states enable a modular, interpretable workflow for interactive appearance modeling. However, creating such graphs remains challenging and typically requires professional training. While recent neural program synthesis approaches attempt to simplify this process, they solely represent graphs as textual programs, failing to capture the inherently visual-spatial nature of node graphs that makes them accessible to humans. To address this gap, we present MultiMat, a multimodal program synthesis framework that leverages large multimodal models to process both visual and textual graph representations for improved generation of procedural material graphs. We train our models on a new dataset of production-quality procedural materials and combine them with a constrained tree search inference algorithm that ensures static correctness while efficiently navigating the program space. Our experimental results show that our multimodal program synthesis method is more efficient in both unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MultiMat, a multimodal program synthesis framework that uses large multimodal models to generate procedural material node graphs by jointly processing visual renderings and textual representations. It presents a new production-quality dataset of procedural materials and combines the models with a constrained tree search inference algorithm to ensure static correctness. The central claim is that this multimodal approach yields more efficient unconditional and conditional graph synthesis with higher visual quality and fidelity than text-only baselines, establishing new state-of-the-art performance.

Significance. If the empirical claims hold after controlling for confounds, the work would advance procedural material generation in computer graphics by demonstrating that multimodal inputs can better capture the visual-spatial structure of node graphs than text alone. The new dataset of production-quality materials represents a concrete, reusable contribution that could support future research in this area.

major comments (2)
  1. [Experimental results / baselines] Experimental section (assumed §4 or §5 based on standard structure): the superiority claim over text-only baselines is load-bearing for the central contribution, yet the abstract and available description provide no quantitative metrics, baseline model sizes, training data volumes, or confirmation that text baselines received equivalent spatial layout information. Without these controls, gains cannot be attributed specifically to visual-spatial encoding rather than differences in model capacity or data.
  2. [Method / inference] Inference algorithm description: the constrained tree search is presented as ensuring static correctness and efficiency, but no ablation or comparison is referenced showing its contribution independent of the multimodal model; if the search is identical across multimodal and text-only settings, this must be stated explicitly to support the efficiency claim.
minor comments (2)
  1. [Introduction] Notation for node graphs and channels (e.g., roughness, albedo) should be introduced with a small diagram or table early in the paper for readers unfamiliar with procedural materials.
  2. [Abstract / Introduction] The abstract asserts 'new state-of-the-art performance' without citing the specific prior neural program synthesis works being surpassed; add explicit citations in the introduction and results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We respond to each major comment in turn and indicate the changes made to the manuscript.

read point-by-point responses
  1. Referee: [Experimental results / baselines] Experimental section (assumed §4 or §5 based on standard structure): the superiority claim over text-only baselines is load-bearing for the central contribution, yet the abstract and available description provide no quantitative metrics, baseline model sizes, training data volumes, or confirmation that text baselines received equivalent spatial layout information. Without these controls, gains cannot be attributed specifically to visual-spatial encoding rather than differences in model capacity or data.

    Authors: We agree that clearer documentation of controls is necessary to attribute improvements specifically to the multimodal input. The full manuscript reports quantitative results in Section 5, including visual quality metrics (e.g., rendered image fidelity and perceptual scores) and efficiency metrics (e.g., graph size and generation time). Both the multimodal and text-only baselines employ the same underlying model architecture and are trained on identical data volumes; the sole distinction is the input modality. The text-only baseline receives only the textual program representation and no additional spatial layout information. We have revised Section 4.1 to add an explicit description of the baseline configurations, model sizes, and training data volumes, along with a statement confirming that text baselines received no extra spatial cues. revision: yes

  2. Referee: [Method / inference] Inference algorithm description: the constrained tree search is presented as ensuring static correctness and efficiency, but no ablation or comparison is referenced showing its contribution independent of the multimodal model; if the search is identical across multimodal and text-only settings, this must be stated explicitly to support the efficiency claim.

    Authors: The referee is correct that an explicit statement and supporting ablation would strengthen the presentation. The constrained tree search is applied identically in both the multimodal and text-only settings to enforce static correctness during inference. We have added a new ablation subsection in the revised Section 5.3 that isolates the search's contribution by comparing results with and without it across both model types. We have also inserted an explicit statement in Section 3.3 that the inference algorithm is identical for all compared methods. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical comparisons to external baselines

full rationale

The paper introduces MultiMat as a multimodal program synthesis method trained on a new production-quality dataset of procedural material graphs, then evaluates it via constrained tree search against text-only baselines on metrics of efficiency, visual quality, and fidelity. No equations, derivations, or self-referential definitions appear in the provided text; performance claims are presented as outcomes of training and inference rather than reductions to fitted parameters or prior self-citations. The central argument relies on observable experimental differences, which remain externally falsifiable and independent of internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the untested premise that multimodal models can meaningfully interpret visual layouts of node graphs and that the constrained search preserves both validity and expressiveness; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Large multimodal models can jointly process visual and textual representations of directed acyclic node graphs to improve synthesis quality over text alone
    This is the core modeling choice stated in the motivation section of the abstract.
  • domain assumption A constrained tree search algorithm can efficiently navigate the program space while guaranteeing static correctness
    Invoked to ensure generated graphs remain valid without further elaboration in the abstract.

pith-pipeline@v0.9.0 · 5749 in / 1330 out tokens · 47278 ms · 2026-05-18T13:37:54.704833+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 5 internal anchors

  1. [1]

    Anthropic

    doi: 10.1109/FMCAD.2013.6679385. Anthropic. System card: Claude Opus 4 & Claude Sonnet 4,

  2. [2]

    Qwen2.5-VL Technical Report

    URLhttps://arxiv.org/abs/2502.13923. Jonas Belouadi, Anne Lauscher, and Steffen Eger. AutomaTikZ: Text-guided synthesis of scientific vector graphics with TikZ. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, May 2024a. URLhttps://openreview.net/forum?id=v3K5TVP8kZ. Jonas Belouadi, Simone Paolo Ponzetto, and Steffen Eg...

  3. [3]

    URLhttps://doi.org/10.1007/ s10270-023-01105-5

    doi: 10.1007/S10270-023-01105-5. URLhttps://doi.org/10.1007/ s10270-023-01105-5. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott ...

  4. [4]

    Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B

    URLhttps://proceedings.neurips.cc/paper_ files/paper/2019/file/50d2d2262762648589b1943078712aa6-Paper.pdf. Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning. InProceedings ...

  5. [5]

    doi: 10.1145/3528223.3530173

    ISSN0730-0301. doi: 10.1145/3528223.3530173. URLhttps://doi.org/10.1145/3528223. 3530173. Abhimanyu Hans, John Kirchenbauer, Yuxin Wen, Neel Jain, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, and Tom Goldstein. Be like a goldfish, don’t memorize! mitigating memorization in generative LLMs. InThe Thi...

  6. [6]

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi

    URLhttps://openreview.net/forum? id=SJgdnAVKDH. Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.),Proceedings of the 2021 Conference on Empirical Methods in Natural Language Process...

  7. [7]

    doi: 10.18653/v1/2021.findings-emnlp.424

    Association for Computational Linguistics. doi: 10.18653/v1/2021. emnlp-main.595. URLhttps://aclanthology.org/2021.emnlp-main.595. Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. Node graph optimization using differentiable proxies. InACM SIGGRAPH 2022 Conference Proceedings, SIGGRAPH ’22, New York, NY, USA,

  8. [8]

    Time-multiplexed

    Association for Computing Machinery. ISBN 9781450393379. doi: 10.1145/3528233.3530733. URLhttps://doi.org/10.1145/3528233. 3530733. Yiwei Hu, Paul Guerrero, Milos Hasan, Holly Rushmeier, and Valentin Deschaintre. Generating proceduralmaterialsfromtextorimageprompts.InACMSIGGRAPH2023ConferenceProceedings, SIGGRAPH ’23, New York, NY, USA,

  9. [9]

    ISBN 9798400701597

    Association for Computing Machinery. ISBN 9798400701597. doi: 10.1145/3588432.3591520. URLhttps://doi.org/10.1145/3588432. 3591520. Nam Huynh and Beiyu Lin. Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications,

  10. [10]

    Large language models for code generation: A comprehensive survey of challenges, techniques, evaluation, and applications,

    URLhttps://arxiv.org/abs/ 2503.01245. Shreyas Kapur, Erik Jenner, and Stuart Russell. Diffusion on syntax trees for program synthesis. InThe Thirteenth International Conference on Learning Representations,

  11. [11]

    Shengzhi Li and Nima Tajbakhsh

    URLhttps://arxiv.org/abs/ 2408.12637. BeichenLi,LiangShi,andWojciechMatusik. End-to-endproceduralmaterialcapturewithproxy-free mixed-integer optimization.ACM Transactions on Graphics (TOG), 42(4):1–15, 2023a. Beichen Li, Yiwei Hu, Paul Guerrero, Milos Hasan, Liang Shi, Valentin Deschaintre, and Wojciech Matusik. Procedural material generation with reinfor...

  12. [12]

    doi: 10.1145/3687979

    ISSN 0730-0301. doi: 10.1145/3687979. URLhttps://doi.org/10.1145/ 3687979. 12 Under review Beichen Li, Rundi Wu, Armando Solar-Lezama, Changxi Zheng, Liang Shi, Bernd Bickel, and Wojciech Matusik. VLMaterial: Procedural material generation with large vision-language models. InThe Thirteenth International Conference on Learning Representations, 2025a. URL ...

  13. [13]

    Wen-Ding Li, Darren Yan Key, and Kevin Ellis

    URLhttps: //openreview.net/forum?id=xqc8yyhScL. Wen-Ding Li, Darren Yan Key, and Kevin Ellis. Toward trustworthy neural program synthesis. In ICLR 2025 Workshop on Human-AI Coevolution, 2025b. URLhttps://openreview.net/ forum?id=HPlvbIJGWy. Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Ré mi Leblond, Tom Eccles, James Keeling, ...

  14. [14]

    Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Ja´skowski

    doi: 10.1126/science.abq1158. URLhttps://doi.org/10. 1126%2Fscience.abq1158. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InText Summarization Branches Out, pp. 74–81, Barcelona, Spain, July

  15. [15]

    StarCoder 2 and The Stack v2: The Next Generation

    URLhttps: //arxiv.org/abs/2402.19173. Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks. InProceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, pp. 1235–1247, New York, NY, USA,

  16. [16]

    ISBN 9781450383431

    Association for Computing Machinery. ISBN 9781450383431. doi: 10.1145/3448016.3457261. URLhttps://doi.org/10.1145/3448016.3457261. Jock Mackinlay. Automating the design of graphical presentations of relational information.ACM Trans. Graph., 5(2):110–141, April

  17. [17]

    doi: 10.1145/22949.22950

    ISSN 0730-0301. doi: 10.1145/22949.22950. URL https://doi.org/10.1145/22949.22950. OpenAI, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJOstrow,AkilaWelihinda,AlanHayes,AlecRadford,AleksanderMądry,AlexBaker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, and 400 others. GPT...

  18. [18]

    GPT-4o System Card

    URLhttps://arxiv.org/abs/2410.21276. Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359,

  19. [19]

    A Survey on Transfer Learning

    doi: 10.1109/TKDE.2009.191. Emilio Parisotto, Abdel rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. Neuro-symbolic program synthesis. InInternational Conference on Learning Representations,

  20. [20]

    URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2003.00716.x

    doi: https://doi.org/10.1111/j.1467-8659.2003.00716.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2003.00716.x. Matt Pharr, Wenzel Jakob, and Greg Humphreys.Physically Based Rendering: From Theory to Implementation(3rded.). MorganKaufmannPublishersInc.,SanFrancisco,CA,USA,3rdedition, November

  21. [21]

    URLhttps://arxiv.org/abs/ 2501.03992. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.),Proceedings of the 38th ...

  22. [22]

    ISBN 0897916506

    Association for Computing Machinery. ISBN 0897916506. doi: 10.1145/191666.191719. URL https: //doi.org/10.1145/191666.191719. Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aar...

  23. [23]

    CSGNet: Neural shape parser for constructive solid geometry

    Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. CSGNet: Neural shape parser for constructive solid geometry. In2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 5515–5523. Computer Vision Foundation / IEEE Computer Society,

  24. [24]

    long tail

    doi: 10.1109/CVPR. 2018.00578. URL http://openaccess.thecvf.com/content_cvpr_2018/html/Sharma_ CSGNet_Neural_Shape_CVPR_2018_paper.html. Liang Shi, Beichen Li, Miloš Hašan, Kalyan Sunkavalli, Tamy Boubekeur, Radomir Mech, and Wojciech Matusik. Match: Differentiable material graphs for procedural material capture.ACM Transactions on Graphics (TOG), 39(6):1–15,

  25. [25]

    HenrikVoigt,KaiLawonn,andSinaZarrieß

    URLhttps://openreview.net/forum?id=Vi8AepAXGy. HenrikVoigt,KaiLawonn,andSinaZarrieß. Plotsmadequickly: Anefficientapproachforgenerating visualizations from natural language queries. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue (eds.),Proceedings of the 2024 Joint International Conference on Comput...

  26. [26]

    URL https://aclanthology.org/2024.lrec-main.1119/

    ELRA and ICCL. URL https://aclanthology.org/2024.lrec-main.1119/. 14 Under review Colin Wei, Kendrick Shen, Yining Chen, and Tengyu Ma. Theoretical analysis of self-training with deep networks on unlabeled data. InInternational Conference on Learning Representations,

  27. [27]

    doi: 10.1145/3618364

    ISSN 0730-0301. doi: 10.1145/3618364. URLhttps://doi.org/10.1145/3618364. Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, and Hai Jin. Automated data visualization from natural language via large language models: An exploratory study.Proc. ACM Manag. Data, 2(3), May

  28. [28]

    Qwen3 Technical Report

    doi: 10.1145/3654992. URL https://doi.org/10.1145/3654992. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. Qwen3 technical report, 2025a. URL https://arxiv.org/abs/2505....

  29. [29]

    doi: 10.1093/nsr/nwae403

    ISSN 2095-5138. doi: 10.1093/nsr/nwae403. URLhttps://doi.org/10.1093/nsr/nwae403. Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Wang Yongji, and Jian-Guang Lou. Large language models meet NL2Code: A survey. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7443...

  30. [30]

    doi: 10.18653/v1/2023.acl-long.411

    Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.411. URLhttps://aclanthology.org/2023.acl-long.411. Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier...

  31. [31]

    A Survey of Large Language Models

    URLhttps://arxiv.org/abs/2303.18223. 15 Under review Input VLMaterial+ (SBS) MultiMat+ (Mixed) MultiMat+ (Graph) Figure 6: Representative failure cases from the same challenging subset in Figure