pith. sign in

arxiv: 2605.21622 · v1 · pith:MR37JEZFnew · submitted 2026-05-20 · 💻 cs.AI

TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization

Pith reviewed 2026-05-22 09:29 UTC · model grok-4.3

classification 💻 cs.AI
keywords topology optimizationmulti-agent AIpreference-guided designvision-language modelsiterative optimizationadditive manufacturingdesign automation
0
0 comments X

The pith

A multi-agent AI pipeline translates qualitative design preferences into effective topology optimization parameters through iterative visual critique and revision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TO-Agents, a framework that automates the connection between natural language descriptions of desired design qualities and the numerical settings required by topology optimization solvers. It achieves this by running the solver, rendering the 3D result, and then using a separate vision-language agent to evaluate the output against the stated preferences before suggesting adjustments for the next round. Testing on a standard cantilever beam and a phone stand design, where the goal is tree-branch inspired structures, shows that at least one good design appears in 60 percent of the ten-trial runs after four cycles. This rate is up to six times higher than when the visual feedback and history are removed. The work also includes steps to prepare the final shapes for 3D printing.

Core claim

TO-Agents converts a human-provided problem description into validated solver inputs, runs a topology optimization solver, renders the resulting 3D topology, and uses multi-view vision-language reasoning with an independent judge agent to critique each result and revise solver parameters. In evaluations on cantilever beam and phone-stand tasks with a preference for hierarchically branched natural morphologies, the system produces at least one preference-aligned design in 60% of trials across four revision cycles in ten replicates, achieving up to 6x more successful trials than an ablated pipeline without visual or historical feedback.

What carries the argument

The independent judge agent that applies multi-view vision-language reasoning to critique rendered topology results and propose solver parameter revisions based on alignment with human aesthetic preferences.

If this is right

  • The pipeline enables end-to-end design from intent to manufacturable prototype by adding a manufacturing agent for post-processing.
  • Designers can focus on specifying high-level form and function rather than manually tuning solver parameters.
  • The system can recover from poor parameter revisions by incorporating historical feedback.
  • Exploration of the design space is expanded through repeated cycles of critique and adjustment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could extend to other engineering optimization problems where qualitative goals like aesthetics or usability must guide numerical solvers.
  • Combining the agent pipeline with more sophisticated rendering or simulation feedback might further improve alignment with complex preferences.
  • Failure modes such as overshooting parameter changes suggest the need for safeguards like bounded revision steps in future autonomous design systems.

Load-bearing premise

The independent judge agent using multi-view vision-language reasoning can reliably identify effective parameter revisions that align the topology output with qualitative human preferences such as hierarchically branched natural morphologies.

What would settle it

A controlled experiment comparing success rates when the judge agent receives only text descriptions of the design versus full multi-view 3D renders would show whether visual reasoning is the key driver of improved parameter revisions.

Figures

Figures reproduced from arXiv: 2605.21622 by Faez Ahmed, Hongrui Chen, Isabella A. Stewart.

Figure 10
Figure 10. Figure 10: C Human Designer Requests The two case studies are initiated by a natural-language problem description authored by the human designer and sup￾plied to the agentic pipeline as the starting context. We render these requests in a distinct blue style to visually separate human￾authored input from agent-generated prompts elsewhere in the appendix. Within each request, boundary conditions are high￾lighted in re… view at source ↗
read the original abstract

Topology optimization can generate efficient structures, but designers often must manually translate qualitative intent, such as desired visual style, product experience, or manufacturability into solver settings that are not directly tied to those preferences. We present TO-Agents, a multi-agent AI framework that connects natural-language design intent with iterative topology optimization. The framework converts a human-provided problem description into validated solver inputs, runs a topology optimization solver, renders the resulting 3D topology, and uses multi-view vision-language reasoning with an independent judge agent to critique each result and revise solver parameters. We evaluate the framework on two long-horizon design tasks: a cantilever beam benchmark and a phone-stand product design. In both tasks, the designer specifies an aesthetic preference for hierarchically branched structures inspired by natural tree morphologies, and the system performs four revision cycles across ten independent replicates. TO-Agents produces at least one preference-aligned design in 60% of trials for each case study, corresponding to up to 6x more successful trials than an ablated pipeline without visual or historical feedback. Judge scores and human evaluations show that the pipeline can identify effective parameter levers, recover from poor revisions, and expand design exploration. A manufacturing agent further post-processes top-ranked designs for additive manufacturing, enabling end-to-end intent-to-prototype design. We also identify failure modes, including overshooting, selective memory, misplaced tools, and incorrect parameter reasoning. These results suggest that agentic topology optimization can shift designers from low-level parameter tuning toward higher-level specification of form and function, while highlighting safeguards needed for reliable autonomous engineering design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces TO-Agents, a multi-agent pipeline that converts natural-language design preferences (e.g., hierarchically branched natural morphologies) into topology-optimization solver inputs, renders 3D results, and uses an independent VLM judge agent for multi-view critique and parameter revision over four cycles. Evaluated on cantilever-beam and phone-stand benchmarks with ten replicates each, the system reports a 60% rate of producing at least one preference-aligned design per case study, up to 6x more successful trials than an ablation without visual or historical feedback. The work also describes a manufacturing post-processing agent and catalogues failure modes such as overshooting and selective memory.

Significance. If the empirical claims hold under rigorous validation, the work offers a concrete demonstration that agentic workflows can bridge qualitative intent to quantitative topology optimization, potentially reducing manual parameter tuning in engineering design. The explicit ablation, identification of concrete failure modes, and end-to-end manufacturing step are positive features that support practical utility.

major comments (3)
  1. [§4 (Experimental Evaluation)] §4 (Experimental Evaluation): The headline claim of 60% success rate and up to 6x improvement over ablation is load-bearing for the central contribution, yet the manuscript supplies no error bars, confidence intervals, exact operational definition of 'preference-aligned' (e.g., how branched morphology is scored from renders), or full protocol for the ten replicates and four cycles.
  2. [§3.2 (Judge Agent)] §3.2 (Judge Agent): The independent VLM judge's multi-view reasoning is the sole arbiter of success; no quantitative agreement metric (Pearson correlation, Cohen's kappa, or similar) between judge scores and human raters is reported, leaving the reliability of the 60% figure and ablation delta open to known VLM limitations on geometric style assessment.
  3. [Ablation study (within §4)] Ablation study (within §4): The comparison to the pipeline 'without visual or historical feedback' requires explicit specification of which components are removed and confirmation that trial counts, cycle limits, and random seeds are matched to support the 6x factor.
minor comments (2)
  1. [Abstract] Abstract: The statement that 'judge scores and human evaluations show...' would be strengthened by indicating the number of human raters and the precise rating protocol used.
  2. [Failure-modes discussion] Failure-modes discussion: Quantifying the frequency of each listed mode (overshooting, selective memory, etc.) across the 20 total trials would make the analysis more informative.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and positive assessment of the work's potential utility. We address each major comment point by point below, indicating revisions to be incorporated in the next version of the manuscript.

read point-by-point responses
  1. Referee: §4 (Experimental Evaluation): The headline claim of 60% success rate and up to 6x improvement over ablation is load-bearing for the central contribution, yet the manuscript supplies no error bars, confidence intervals, exact operational definition of 'preference-aligned' (e.g., how branched morphology is scored from renders), or full protocol for the ten replicates and four cycles.

    Authors: We agree that additional statistical detail and protocol transparency will strengthen the presentation. The success rate is defined as the fraction of the ten independent replicates (each running four cycles) in which at least one output was scored by the judge agent as exhibiting the requested hierarchically branched morphology across the provided multi-view renders. In the revision we will report binomial confidence intervals on the 60% figure, supply the precise judge scoring rubric for branched morphology, and include a step-by-step protocol for replicate generation and cycle execution. revision: yes

  2. Referee: §3.2 (Judge Agent): The independent VLM judge's multi-view reasoning is the sole arbiter of success; no quantitative agreement metric (Pearson correlation, Cohen's kappa, or similar) between judge scores and human raters is reported, leaving the reliability of the 60% figure and ablation delta open to known VLM limitations on geometric style assessment.

    Authors: We acknowledge the value of a quantitative inter-rater agreement metric. The manuscript already notes that human evaluations were collected and aligned with judge scores on the evaluated designs; however, no formal statistic such as Cohen's kappa was computed. We will add this analysis in the revised §3.2 and §4 by reporting agreement on the subset of designs that received human ratings. revision: yes

  3. Referee: Ablation study (within §4): The comparison to the pipeline 'without visual or historical feedback' requires explicit specification of which components are removed and confirmation that trial counts, cycle limits, and random seeds are matched to support the 6x factor.

    Authors: We will clarify the ablation definition in the revised manuscript. The ablated pipeline removes both the visual critique step performed by the judge agent and the historical context passed to the parameter-revision agent, while retaining identical trial counts (ten replicates), cycle limits (four), and random seeds to ensure a controlled comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical trial counts and ablation are independent of fitted inputs or self-definitions

full rationale

The paper describes a multi-agent pipeline and reports direct experimental outcomes (60% success across ten replicates per task, up to 6x improvement over ablation) measured by explicit trial execution and judge/human scoring. No mathematical derivation, parameter fitting to target data, or self-referential definition appears in the abstract or described framework; success is counted from runs rather than predicted from quantities defined in terms of the same runs. The ablation baseline and mention of human evaluations supply external comparison points, keeping the central claims self-contained against the stated protocol.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework depends on assumptions about the reliability of current vision-language models for engineering critique and on the chosen number of revision cycles; no new physical entities are postulated.

free parameters (1)
  • number of revision cycles = 4
    Fixed at four for the reported experiments to enable iterative refinement.
axioms (1)
  • domain assumption Multi-view renders of 3D topologies provide sufficient information for vision-language models to critique alignment with qualitative preferences
    Invoked in the critique and revision step of the pipeline.
invented entities (1)
  • Judge agent no independent evidence
    purpose: Independent critic that scores results and proposes solver parameter changes
    New component introduced to close the feedback loop between visual output and parameter adjustment.

pith-pipeline@v0.9.0 · 5823 in / 1389 out tokens · 57197 ms · 2026-05-22T09:29:39.748439+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 12 internal anchors

  1. [1]

    A practical generative design method

    Krish, S., 2011. “A practical generative design method”. Computer-Aided Design,43(1), Jan., pp. 88–100

  2. [2]

    Length scale and manufacturability in density-based topology op- timization

    Lazarov, B. S., Wang, F., and Sigmund, O., 2016. “Length scale and manufacturability in density-based topology op- timization”.Archive of Applied Mechanics,86(1-2), Jan., pp. 189–218

  3. [3]

    Multi-Objective Counterfactuals for Design: A Model- Agnostic Counterfactual Search Method for Multi-Modal Design Modifications

    Regenwetter, L., Abu Obaideh, Y ., and Ahmed, F., 2025. “Multi-Objective Counterfactuals for Design: A Model- Agnostic Counterfactual Search Method for Multi-Modal Design Modifications”.Journal of Mechanical Design, 147(2), Feb., p. 021401

  4. [4]

    A 99 line topology optimization code written in Matlab

    Sigmund, O., 2001. “A 99 line topology optimization code written in Matlab”.Structural and Multidisciplinary Opti- mization,21(2), Apr., pp. 120–127

  5. [5]

    Efficient topology opti- mization in MATLAB using 88 lines of code

    Andreassen, E., Clausen, A., Schevenels, M., Lazarov, B. S., and Sigmund, O., 2011. “Efficient topology opti- mization in MATLAB using 88 lines of code”.Structural and Multidisciplinary Optimization,43(1), Jan., pp. 1–16

  6. [6]

    On benchmarking and good scientific practise in topology optimization

    Sigmund, O., 2022. “On benchmarking and good scientific practise in topology optimization”.Structural and Multi- disciplinary Optimization,65(11), Nov., p. 315

  7. [7]

    Towards intentional aesthetics within topology optimization by applying the principle of unity-in-variety

    Loos, S., Wolk, S. V . D., Graaf, N. D., Hekkert, P., and Wu, J., 2022. “Towards intentional aesthetics within topology optimization by applying the principle of unity-in-variety”. Structural and Multidisciplinary Optimization,65(7), July, p. 185

  8. [8]

    The effect of targeting both quantitative and qualitative objectives in generative design tools on the design outcomes

    Saadi, J. I., Chong, L., and Yang, M. C., 2024. “The effect of targeting both quantitative and qualitative objectives in generative design tools on the design outcomes”.Research in Engineering Design,35(4), Oct., pp. 409–425

  9. [9]

    HiTop 2.0: combining topology optimisation with multiple feature size controls and human preferences

    Schiffer, G., Ha, D. Q., and Carstensen, J. V ., 2023. “HiTop 2.0: combining topology optimisation with multiple feature size controls and human preferences”.Virtual and Physical Prototyping,18(1), Dec., p. e2268603

  10. [10]

    Combining structural performance and designer preferences in evolu- tionary design space exploration

    Mueller, C. T., and Ochsendorf, J. A., 2015. “Combining structural performance and designer preferences in evolu- tionary design space exploration”.Automation in Construc- tion,52, Apr., pp. 70–82

  11. [11]

    Handling integrated quantitative and qualitative search space in a real world optimisation problem

    Oduguwa, V ., Tiwari, A., and Roy, R., 2003. “Handling integrated quantitative and qualitative search space in a real world optimisation problem”. In The 2003 Congress on Evolutionary Computation, 2003. CEC ’03., V ol. 2, IEEE, pp. 1222–1229

  12. [12]

    An interactive genetic algorithm-based framework for handling qualitative criteria in design optimization

    Brintrup, A. M., Ramsden, J., and Tiwari, A., 2007. “An interactive genetic algorithm-based framework for handling qualitative criteria in design optimization”.Computers in Industry,58(3), Apr., pp. 279–291

  13. [13]

    OBSERV ATIONS ON THE IMPLICATIONS OF GENERATIVE DESIGN TOOLS ON DESIGN PROCESS AND DESIGNER BE- HA VIOUR

    Saadi, J., and Yang, M., 2023. “OBSERV ATIONS ON THE IMPLICATIONS OF GENERATIVE DESIGN TOOLS ON DESIGN PROCESS AND DESIGNER BE- HA VIOUR”.Proceedings of the Design Society,3, July, pp. 2805–2814

  14. [14]

    GenCAD: Image- Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffu- sion Priors

    Alam, M. F., and Ahmed, F., 2024. “GenCAD: Image- Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffu- sion Priors”.arXiv preprint arXiv:2409.16294

  15. [15]

    J., and Ahmed, F.,

    Yu, N., Ferdous Alam, M., Hart, A. J., and Ahmed, F.,

  16. [16]

    GenCAD-Three-Dimensional: Computer-Aided Design Program Generation Using Multimodal Latent Space Alignment and Synthetic Dataset Balancing

    “GenCAD-Three-Dimensional: Computer-Aided Design Program Generation Using Multimodal Latent Space Alignment and Synthetic Dataset Balancing”.Jour- nal of Mechanical Design,148(3), Mar., p. 031703

  17. [17]

    BikeBench: A Bicycle De- sign Benchmark for Generative Models with Objectives and Constraints

    Regenwetter, L., Obaideh, Y . A., Chiotti, F., Lykourent- zou, I., and Ahmed, F., 2025. “BikeBench: A Bicycle De- sign Benchmark for Generative Models with Objectives and Constraints”.arXiv preprint arXiv:2508.00830

  18. [18]

    DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Prediction

    Elrefaie, M., Dai, A., and Ahmed, F., 2025. “DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Prediction”.Journal of Mechanical Design, 147(4), Apr., p. 041712

  19. [19]

    BlendedNet: A Blended Wing Body Aircraft Dataset and Surrogate Model for Aero- dynamic Predictions

    Sung, N., Spreizer, S., Elrefaie, M., Samuel, K., Jones, M. C., and Ahmed, F., 2025. “BlendedNet: A Blended Wing Body Aircraft Dataset and Surrogate Model for Aero- dynamic Predictions”. In V olume 3B: 51st Design Automa- tion Conference (DAC), American Society of Mechanical Engineers, p. V03BT03A049

  20. [20]

    Large Language Mod- els Are Human-Level Prompt Engineers

    Zhou, Y ., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., and Ba, J., 2022. “Large Language Mod- els Are Human-Level Prompt Engineers”.arXiv preprint arXiv:2211.01910

  21. [21]

    Attention Is All You Need

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I., 2017. “Attention Is All You Need”.arXiv preprint arXiv:1706.03762

  22. [22]

    A Survey of Large Language Models

    Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y ., Min, Y ., Zhang, B., Zhang, J., Dong, Z., Du, Y ., Yang, C., Chen, Y ., Chen, Z., Jiang, J., Ren, R., Li, Y ., Tang, X., Liu, Z., Liu, P., Nie, J.-Y ., and Wen, J.-R., 2023. “A Survey of Large Language Models”.arXiv preprint arXiv:2303.18223

  23. [23]

    Emergent Abilities of Large Language Models

    Wei, J., Tay, Y ., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., and Fedus, W., 2022. “Emergent Abilities of Large Lan- guage Models”.arXiv preprint arXiv:2206.07682

  24. [24]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y ., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V ., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V ., Kha...

  25. [25]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D., 2022. “Chain-of- Thought Prompting Elicits Reasoning in Large Language Models”.arXiv preprint arXiv:2201.11903

  26. [26]

    Evaluating Large Language Models in Scientific Discovery

    Song, Z., Lu, J., Du, Y ., Yu, B., Pruyn, T. M., Huang, Y ., Guo, K., Luo, X., Qu, Y ., Qu, Y ., Wang, Y ., Wang, H., Guo, J., Gan, J., Shojaee, P., Luo, D., Bran, A. M., Li, G., Zhao, Q., Luo, S.-X. L., Zhang, Y ., Zou, X., Zhao, W., Zhang, Y . F., Zhang, W., Zheng, S., Zhang, S., Khan, S. T., Rajabi- Kochi, M., Paradi-Maropakis, S., Baltoiu, T., Xie, F....

  27. [27]

    Zhou, Y ., Liu, H., Srivastava, T., Mei, H., and Tan, C.,

  28. [28]

    Hypothesis Generation with Large Language Mod- els

    “Hypothesis Generation with Large Language Mod- els”. In Proceedings of the 1st Workshop on NLP for Sci- ence (NLP4Science), Association for Computational Lin- guistics, pp. 117–139

  29. [29]

    Exploring the role of large language models in the scientific method: from hypothesis to discovery

    Zhang, Y ., Khan, S. A., Mahmud, A., Yang, H., Lavin, A., Levin, M., Frey, J., Dunnmon, J., Evans, J., Bundy, A., Dzeroski, S., Tegner, J., and Zenil, H., 2025. “Exploring the role of large language models in the scientific method: from hypothesis to discovery”.npj Artificial Intelligence, 1(1), Aug., p. 14

  30. [30]

    Higher- Order Knowledge Representations for Agentic Scien- tific Reasoning

    Stewart, I. A., and Buehler, M. J., 2026. “Higher- Order Knowledge Representations for Agentic Scien- tific Reasoning”.arXiv preprint arXiv:2601.04878, Jan. arXiv:2601.04878 [cs]

  31. [31]

    Agentic AI: The age of reasoning—A re- view

    Nisa, U., Shirazi, M., Saip, M. A., and Pozi, M. S. M., 2025. “Agentic AI: The age of reasoning—A re- view”.Journal of Automation and Intelligence, Aug., p. S2949855425000516

  32. [32]

    From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users

    Chowa, S. S., Alvi, R., Rahman, S. S., Rahman, M. A., Raiaan, M. A. K., Islam, M. R., Hussain, M., and Azam, S., 2025. “From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users”. arXiv preprint arXiv:2508.17281

  33. [33]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y ., 2022. “ReAct: Synergizing Reason- ing and Acting in Language Models”.arXiv preprint arXiv:2210.03629

  34. [34]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Wu, Q., Bansal, G., Zhang, J., Wu, Y ., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., and Wang, C., 2023. “AutoGen: En- abling Next-Gen LLM Applications via Multi-Agent Con- versation”.arXiv preprint arXiv:2308.08155

  35. [35]

    Exploration of LLM Multi-Agent Application Implementation Based on Lang- Graph+CrewAI

    Duan, Z., and Wang, J., 2024. “Exploration of LLM Multi-Agent Application Implementation Based on Lang- Graph+CrewAI”.arXiv preprint arXiv:2411.18241

  36. [36]

    Agent AI with Lang- Graph: A Modular Framework for Enhancing Machine Translation Using Large Language Models

    Wang, J., and Duan, Z., 2024. “Agent AI with Lang- Graph: A Modular Framework for Enhancing Machine Translation Using Large Language Models”.arXiv preprint arXiv:2412.03801

  37. [37]

    A., Hage, T

    Stewart, I. A., Hage, T. P., Hsu, Y .-C., and Buehler, M. J.,

  38. [38]

    GraphAgents: Knowledge Graph-Guided Agentic AI for Cross-Domain Materials Design

    “GraphAgents: Knowledge Graph-Guided Agentic AI for Cross-Domain Materials Design”.arXiv preprint arXiv:2602.07491

  39. [39]

    Robin: A multi-agent system for automating scientific discovery

    Ghareeb, A. E., Chang, B., Mitchener, L., Yiu, A., Szostkiewicz, C. J., Laurent, J. M., Razzak, M. T., White, A. D., Hinks, M. M., and Rodriques, S. G., 2025. “Robin: A multi-agent system for automating scientific discovery”. arXiv preprint arXiv:2505.13400

  40. [40]

    AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design

    Elrefaie, M., Qian, J., Wu, R., Chen, Q., Dai, A., and Ahmed, F., 2025. “AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design”. In V olume 3B: 51st Design Automation Conference (DAC), American Society of Mechanical En- gineers, p. V03BT03A048

  41. [41]

    MechAgents: Large lan- guage model multi-agent collaborations can solve mechan- ics problems, generate new data, and integrate knowledge

    Ni, B., and Buehler, M. J., 2023. “MechAgents: Large lan- guage model multi-agent collaborations can solve mechan- ics problems, generate new data, and integrate knowledge”. arXiv preprint arXiv:2311.08166

  42. [42]

    Agentic Large Lan- guage Models for Conceptual Systems Engineering and Design

    Massoudi, S., and Fuge, M., 2025. “Agentic Large Lan- guage Models for Conceptual Systems Engineering and Design”. In V olume 3B: 51st Design Automation Confer- ence (DAC), American Society of Mechanical Engineers, p. V03BT03A045

  43. [43]

    G., Dorsey, T., Montague, D., Matveenko, S., Trylesinski, M., Runkle, S., Hewitt, D., and Hall, A., 2024

    Colvin, S., Jolibois, E., Ramezani, H., Badaracco, A. G., Dorsey, T., Montague, D., Matveenko, S., Trylesinski, M., Runkle, S., Hewitt, D., and Hall, A., 2024. Pydantic (version 2.9.0). GitHub repository,https://github. com/pydantic/pydantic. Accessed: 2026-05-18

  44. [44]

    pyfantom

    Ahnobari, A., 2024. pyfantom. GitHub repository, https://github.com/ahnobari/pyFANTOM. Ac- cessed: 2026-05-18

  45. [45]

    Computer software

    Bambu Lab, 2026.Bambu Studio (version 2.5.0). Computer software. Available athttps://bambulab. com/en-us/download/studio, accessed 2026-05- 18

  46. [46]

    Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., Ge, W., Fan, Y ., Dang, K., Du, M., Ren, X., Men, R., Liu, D., Zhou, C., Zhou, J., and Lin, J.,

  47. [47]

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    “Qwen2-VL: Enhancing Vision-Language Model’s 17 Copyright © by ASME Perception of the World at Any Resolution”.arXiv preprint arXiv:2409.12191

  48. [48]

    Self-Preference Bias in LLM-as-a-Judge

    Wataoka, K., Takahashi, T., and Ri, R., 2024. “Self- Preference Bias in LLM-as-a-Judge”.arXiv preprint arXiv:2410.21819

  49. [49]

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Zheng, L., Chiang, W.-L., Sheng, Y ., Zhuang, S., Wu, Z., Zhuang, Y ., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., and Stoica, I., 2023. “Judging LLM-as-a- Judge with MT-Bench and Chatbot Arena”.arXiv preprint arXiv:2306.05685

  50. [50]

    “Gemma 3”.Kaggle Tech- nical Report

    Gemma Team, 2025. “Gemma 3”.Kaggle Tech- nical Report. Available athttps://goo.gle/ Gemma3Report, accessed 2026-05-18

  51. [51]

    H., Gonzalez, J

    Kwon, W., Li, Z., Zhuang, S., Sheng, Y ., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I.,

  52. [52]

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    “Efficient Memory Management for Large Lan- guage Model Serving with PagedAttention”.arXiv preprint arXiv:2309.06180

  53. [53]

    Schroeder, W., Martin, K., and Lorensen, B., 2006.The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics. Kitware

  54. [54]

    K3d: Lightweight wrapper to run k3s in docker.https://k3d.io/stable/

    K3d Project, n.d.. K3d: Lightweight wrapper to run k3s in docker.https://k3d.io/stable/. Accessed: 2026- 05-18. A AI Judge Agent Prompt The exact text supplied to the AI Judge for each layer is reproduced verbatim in Figure 8. B Vision Agent Prompt Vision Agent system prompt is reproduced in Figure 9, and the revision instructions injected at each turn ar...

  55. [55]

    Input data consist of nodal coordinates, element connectiv- ity, nodal forces, boundary constraints, and element density vari- ables from topology optimization

    for WebGL-based rendering, and Playwright for automated screenshot capture within an asynchronous Python workflow. Input data consist of nodal coordinates, element connectiv- ity, nodal forces, boundary constraints, and element density vari- ables from topology optimization. When density variables are provided, elements with densityρ<0.5 are excluded from...

  56. [56]

    Visually compare the structures

  57. [57]

    Identify which parameter changes correlated with higher scores

  58. [58]

    Identify which changes led to worse results

  59. [59]

    Build on what worked and avoid repeating what didn’t

  60. [60]

    Follow these rules:

    Be specific about which parameters to change and by how much. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Follow these rules:

  61. [61]

    The design must remain functional as a phone stand where the phone lies along the diagonal surface

  62. [62]

    FIGURE 10: Revision instructions injected into the vision agent’s context at each turn, governing how it ingests prior iterations and what hard constraints it must respect

    Do not go below a density filter radius of 1.5. FIGURE 10: Revision instructions injected into the vision agent’s context at each turn, governing how it ingests prior iterations and what hard constraints it must respect. context to the vision agent. E Post-Processing Geometry Rules Following topology optimization, the optimized density vol- umex Phys is m...