TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided Topology Optimization
Pith reviewed 2026-05-22 09:29 UTC · model grok-4.3
The pith
A multi-agent AI pipeline translates qualitative design preferences into effective topology optimization parameters through iterative visual critique and revision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TO-Agents converts a human-provided problem description into validated solver inputs, runs a topology optimization solver, renders the resulting 3D topology, and uses multi-view vision-language reasoning with an independent judge agent to critique each result and revise solver parameters. In evaluations on cantilever beam and phone-stand tasks with a preference for hierarchically branched natural morphologies, the system produces at least one preference-aligned design in 60% of trials across four revision cycles in ten replicates, achieving up to 6x more successful trials than an ablated pipeline without visual or historical feedback.
What carries the argument
The independent judge agent that applies multi-view vision-language reasoning to critique rendered topology results and propose solver parameter revisions based on alignment with human aesthetic preferences.
If this is right
- The pipeline enables end-to-end design from intent to manufacturable prototype by adding a manufacturing agent for post-processing.
- Designers can focus on specifying high-level form and function rather than manually tuning solver parameters.
- The system can recover from poor parameter revisions by incorporating historical feedback.
- Exploration of the design space is expanded through repeated cycles of critique and adjustment.
Where Pith is reading between the lines
- This method could extend to other engineering optimization problems where qualitative goals like aesthetics or usability must guide numerical solvers.
- Combining the agent pipeline with more sophisticated rendering or simulation feedback might further improve alignment with complex preferences.
- Failure modes such as overshooting parameter changes suggest the need for safeguards like bounded revision steps in future autonomous design systems.
Load-bearing premise
The independent judge agent using multi-view vision-language reasoning can reliably identify effective parameter revisions that align the topology output with qualitative human preferences such as hierarchically branched natural morphologies.
What would settle it
A controlled experiment comparing success rates when the judge agent receives only text descriptions of the design versus full multi-view 3D renders would show whether visual reasoning is the key driver of improved parameter revisions.
Figures
read the original abstract
Topology optimization can generate efficient structures, but designers often must manually translate qualitative intent, such as desired visual style, product experience, or manufacturability into solver settings that are not directly tied to those preferences. We present TO-Agents, a multi-agent AI framework that connects natural-language design intent with iterative topology optimization. The framework converts a human-provided problem description into validated solver inputs, runs a topology optimization solver, renders the resulting 3D topology, and uses multi-view vision-language reasoning with an independent judge agent to critique each result and revise solver parameters. We evaluate the framework on two long-horizon design tasks: a cantilever beam benchmark and a phone-stand product design. In both tasks, the designer specifies an aesthetic preference for hierarchically branched structures inspired by natural tree morphologies, and the system performs four revision cycles across ten independent replicates. TO-Agents produces at least one preference-aligned design in 60% of trials for each case study, corresponding to up to 6x more successful trials than an ablated pipeline without visual or historical feedback. Judge scores and human evaluations show that the pipeline can identify effective parameter levers, recover from poor revisions, and expand design exploration. A manufacturing agent further post-processes top-ranked designs for additive manufacturing, enabling end-to-end intent-to-prototype design. We also identify failure modes, including overshooting, selective memory, misplaced tools, and incorrect parameter reasoning. These results suggest that agentic topology optimization can shift designers from low-level parameter tuning toward higher-level specification of form and function, while highlighting safeguards needed for reliable autonomous engineering design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TO-Agents, a multi-agent pipeline that converts natural-language design preferences (e.g., hierarchically branched natural morphologies) into topology-optimization solver inputs, renders 3D results, and uses an independent VLM judge agent for multi-view critique and parameter revision over four cycles. Evaluated on cantilever-beam and phone-stand benchmarks with ten replicates each, the system reports a 60% rate of producing at least one preference-aligned design per case study, up to 6x more successful trials than an ablation without visual or historical feedback. The work also describes a manufacturing post-processing agent and catalogues failure modes such as overshooting and selective memory.
Significance. If the empirical claims hold under rigorous validation, the work offers a concrete demonstration that agentic workflows can bridge qualitative intent to quantitative topology optimization, potentially reducing manual parameter tuning in engineering design. The explicit ablation, identification of concrete failure modes, and end-to-end manufacturing step are positive features that support practical utility.
major comments (3)
- [§4 (Experimental Evaluation)] §4 (Experimental Evaluation): The headline claim of 60% success rate and up to 6x improvement over ablation is load-bearing for the central contribution, yet the manuscript supplies no error bars, confidence intervals, exact operational definition of 'preference-aligned' (e.g., how branched morphology is scored from renders), or full protocol for the ten replicates and four cycles.
- [§3.2 (Judge Agent)] §3.2 (Judge Agent): The independent VLM judge's multi-view reasoning is the sole arbiter of success; no quantitative agreement metric (Pearson correlation, Cohen's kappa, or similar) between judge scores and human raters is reported, leaving the reliability of the 60% figure and ablation delta open to known VLM limitations on geometric style assessment.
- [Ablation study (within §4)] Ablation study (within §4): The comparison to the pipeline 'without visual or historical feedback' requires explicit specification of which components are removed and confirmation that trial counts, cycle limits, and random seeds are matched to support the 6x factor.
minor comments (2)
- [Abstract] Abstract: The statement that 'judge scores and human evaluations show...' would be strengthened by indicating the number of human raters and the precise rating protocol used.
- [Failure-modes discussion] Failure-modes discussion: Quantifying the frequency of each listed mode (overshooting, selective memory, etc.) across the 20 total trials would make the analysis more informative.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and positive assessment of the work's potential utility. We address each major comment point by point below, indicating revisions to be incorporated in the next version of the manuscript.
read point-by-point responses
-
Referee: §4 (Experimental Evaluation): The headline claim of 60% success rate and up to 6x improvement over ablation is load-bearing for the central contribution, yet the manuscript supplies no error bars, confidence intervals, exact operational definition of 'preference-aligned' (e.g., how branched morphology is scored from renders), or full protocol for the ten replicates and four cycles.
Authors: We agree that additional statistical detail and protocol transparency will strengthen the presentation. The success rate is defined as the fraction of the ten independent replicates (each running four cycles) in which at least one output was scored by the judge agent as exhibiting the requested hierarchically branched morphology across the provided multi-view renders. In the revision we will report binomial confidence intervals on the 60% figure, supply the precise judge scoring rubric for branched morphology, and include a step-by-step protocol for replicate generation and cycle execution. revision: yes
-
Referee: §3.2 (Judge Agent): The independent VLM judge's multi-view reasoning is the sole arbiter of success; no quantitative agreement metric (Pearson correlation, Cohen's kappa, or similar) between judge scores and human raters is reported, leaving the reliability of the 60% figure and ablation delta open to known VLM limitations on geometric style assessment.
Authors: We acknowledge the value of a quantitative inter-rater agreement metric. The manuscript already notes that human evaluations were collected and aligned with judge scores on the evaluated designs; however, no formal statistic such as Cohen's kappa was computed. We will add this analysis in the revised §3.2 and §4 by reporting agreement on the subset of designs that received human ratings. revision: yes
-
Referee: Ablation study (within §4): The comparison to the pipeline 'without visual or historical feedback' requires explicit specification of which components are removed and confirmation that trial counts, cycle limits, and random seeds are matched to support the 6x factor.
Authors: We will clarify the ablation definition in the revised manuscript. The ablated pipeline removes both the visual critique step performed by the judge agent and the historical context passed to the parameter-revision agent, while retaining identical trial counts (ten replicates), cycle limits (four), and random seeds to ensure a controlled comparison. revision: yes
Circularity Check
No circularity: empirical trial counts and ablation are independent of fitted inputs or self-definitions
full rationale
The paper describes a multi-agent pipeline and reports direct experimental outcomes (60% success across ten replicates per task, up to 6x improvement over ablation) measured by explicit trial execution and judge/human scoring. No mathematical derivation, parameter fitting to target data, or self-referential definition appears in the abstract or described framework; success is counted from runs rather than predicted from quantities defined in terms of the same runs. The ablation baseline and mention of human evaluations supply external comparison points, keeping the central claims self-contained against the stated protocol.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of revision cycles =
4
axioms (1)
- domain assumption Multi-view renders of 3D topologies provide sufficient information for vision-language models to critique alignment with qualitative preferences
invented entities (1)
-
Judge agent
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TO-Agents produces at least one preference-aligned design in 60% of trials... uses multi-view vision-language reasoning with an independent judge agent to critique each result and revise solver parameters.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SIMP penalty p... volume fraction f... filter radius r_min... mesh resolution
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A practical generative design method
Krish, S., 2011. “A practical generative design method”. Computer-Aided Design,43(1), Jan., pp. 88–100
work page 2011
-
[2]
Length scale and manufacturability in density-based topology op- timization
Lazarov, B. S., Wang, F., and Sigmund, O., 2016. “Length scale and manufacturability in density-based topology op- timization”.Archive of Applied Mechanics,86(1-2), Jan., pp. 189–218
work page 2016
-
[3]
Regenwetter, L., Abu Obaideh, Y ., and Ahmed, F., 2025. “Multi-Objective Counterfactuals for Design: A Model- Agnostic Counterfactual Search Method for Multi-Modal Design Modifications”.Journal of Mechanical Design, 147(2), Feb., p. 021401
work page 2025
-
[4]
A 99 line topology optimization code written in Matlab
Sigmund, O., 2001. “A 99 line topology optimization code written in Matlab”.Structural and Multidisciplinary Opti- mization,21(2), Apr., pp. 120–127
work page 2001
-
[5]
Efficient topology opti- mization in MATLAB using 88 lines of code
Andreassen, E., Clausen, A., Schevenels, M., Lazarov, B. S., and Sigmund, O., 2011. “Efficient topology opti- mization in MATLAB using 88 lines of code”.Structural and Multidisciplinary Optimization,43(1), Jan., pp. 1–16
work page 2011
-
[6]
On benchmarking and good scientific practise in topology optimization
Sigmund, O., 2022. “On benchmarking and good scientific practise in topology optimization”.Structural and Multi- disciplinary Optimization,65(11), Nov., p. 315
work page 2022
-
[7]
Loos, S., Wolk, S. V . D., Graaf, N. D., Hekkert, P., and Wu, J., 2022. “Towards intentional aesthetics within topology optimization by applying the principle of unity-in-variety”. Structural and Multidisciplinary Optimization,65(7), July, p. 185
work page 2022
-
[8]
Saadi, J. I., Chong, L., and Yang, M. C., 2024. “The effect of targeting both quantitative and qualitative objectives in generative design tools on the design outcomes”.Research in Engineering Design,35(4), Oct., pp. 409–425
work page 2024
-
[9]
HiTop 2.0: combining topology optimisation with multiple feature size controls and human preferences
Schiffer, G., Ha, D. Q., and Carstensen, J. V ., 2023. “HiTop 2.0: combining topology optimisation with multiple feature size controls and human preferences”.Virtual and Physical Prototyping,18(1), Dec., p. e2268603
work page 2023
-
[10]
Combining structural performance and designer preferences in evolu- tionary design space exploration
Mueller, C. T., and Ochsendorf, J. A., 2015. “Combining structural performance and designer preferences in evolu- tionary design space exploration”.Automation in Construc- tion,52, Apr., pp. 70–82
work page 2015
-
[11]
Handling integrated quantitative and qualitative search space in a real world optimisation problem
Oduguwa, V ., Tiwari, A., and Roy, R., 2003. “Handling integrated quantitative and qualitative search space in a real world optimisation problem”. In The 2003 Congress on Evolutionary Computation, 2003. CEC ’03., V ol. 2, IEEE, pp. 1222–1229
work page 2003
-
[12]
Brintrup, A. M., Ramsden, J., and Tiwari, A., 2007. “An interactive genetic algorithm-based framework for handling qualitative criteria in design optimization”.Computers in Industry,58(3), Apr., pp. 279–291
work page 2007
-
[13]
Saadi, J., and Yang, M., 2023. “OBSERV ATIONS ON THE IMPLICATIONS OF GENERATIVE DESIGN TOOLS ON DESIGN PROCESS AND DESIGNER BE- HA VIOUR”.Proceedings of the Design Society,3, July, pp. 2805–2814
work page 2023
-
[14]
Alam, M. F., and Ahmed, F., 2024. “GenCAD: Image- Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffu- sion Priors”.arXiv preprint arXiv:2409.16294
- [15]
-
[16]
“GenCAD-Three-Dimensional: Computer-Aided Design Program Generation Using Multimodal Latent Space Alignment and Synthetic Dataset Balancing”.Jour- nal of Mechanical Design,148(3), Mar., p. 031703
-
[17]
BikeBench: A Bicycle De- sign Benchmark for Generative Models with Objectives and Constraints
Regenwetter, L., Obaideh, Y . A., Chiotti, F., Lykourent- zou, I., and Ahmed, F., 2025. “BikeBench: A Bicycle De- sign Benchmark for Generative Models with Objectives and Constraints”.arXiv preprint arXiv:2508.00830
-
[18]
DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Prediction
Elrefaie, M., Dai, A., and Ahmed, F., 2025. “DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Prediction”.Journal of Mechanical Design, 147(4), Apr., p. 041712
work page 2025
-
[19]
BlendedNet: A Blended Wing Body Aircraft Dataset and Surrogate Model for Aero- dynamic Predictions
Sung, N., Spreizer, S., Elrefaie, M., Samuel, K., Jones, M. C., and Ahmed, F., 2025. “BlendedNet: A Blended Wing Body Aircraft Dataset and Surrogate Model for Aero- dynamic Predictions”. In V olume 3B: 51st Design Automa- tion Conference (DAC), American Society of Mechanical Engineers, p. V03BT03A049
work page 2025
-
[20]
Large Language Mod- els Are Human-Level Prompt Engineers
Zhou, Y ., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., and Ba, J., 2022. “Large Language Mod- els Are Human-Level Prompt Engineers”.arXiv preprint arXiv:2211.01910
-
[21]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I., 2017. “Attention Is All You Need”.arXiv preprint arXiv:1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[22]
A Survey of Large Language Models
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y ., Min, Y ., Zhang, B., Zhang, J., Dong, Z., Du, Y ., Yang, C., Chen, Y ., Chen, Z., Jiang, J., Ren, R., Li, Y ., Tang, X., Liu, Z., Liu, P., Nie, J.-Y ., and Wen, J.-R., 2023. “A Survey of Large Language Models”.arXiv preprint arXiv:2303.18223
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
Emergent Abilities of Large Language Models
Wei, J., Tay, Y ., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., and Fedus, W., 2022. “Emergent Abilities of Large Lan- guage Models”.arXiv preprint arXiv:2206.07682
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[24]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y ., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V ., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V ., Kha...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D., 2022. “Chain-of- Thought Prompting Elicits Reasoning in Large Language Models”.arXiv preprint arXiv:2201.11903
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Evaluating Large Language Models in Scientific Discovery
Song, Z., Lu, J., Du, Y ., Yu, B., Pruyn, T. M., Huang, Y ., Guo, K., Luo, X., Qu, Y ., Qu, Y ., Wang, Y ., Wang, H., Guo, J., Gan, J., Shojaee, P., Luo, D., Bran, A. M., Li, G., Zhao, Q., Luo, S.-X. L., Zhang, Y ., Zou, X., Zhao, W., Zhang, Y . F., Zhang, W., Zheng, S., Zhang, S., Khan, S. T., Rajabi- Kochi, M., Paradi-Maropakis, S., Baltoiu, T., Xie, F....
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
Zhou, Y ., Liu, H., Srivastava, T., Mei, H., and Tan, C.,
-
[28]
Hypothesis Generation with Large Language Mod- els
“Hypothesis Generation with Large Language Mod- els”. In Proceedings of the 1st Workshop on NLP for Sci- ence (NLP4Science), Association for Computational Lin- guistics, pp. 117–139
-
[29]
Exploring the role of large language models in the scientific method: from hypothesis to discovery
Zhang, Y ., Khan, S. A., Mahmud, A., Yang, H., Lavin, A., Levin, M., Frey, J., Dunnmon, J., Evans, J., Bundy, A., Dzeroski, S., Tegner, J., and Zenil, H., 2025. “Exploring the role of large language models in the scientific method: from hypothesis to discovery”.npj Artificial Intelligence, 1(1), Aug., p. 14
work page 2025
-
[30]
Higher- Order Knowledge Representations for Agentic Scien- tific Reasoning
Stewart, I. A., and Buehler, M. J., 2026. “Higher- Order Knowledge Representations for Agentic Scien- tific Reasoning”.arXiv preprint arXiv:2601.04878, Jan. arXiv:2601.04878 [cs]
-
[31]
Agentic AI: The age of reasoning—A re- view
Nisa, U., Shirazi, M., Saip, M. A., and Pozi, M. S. M., 2025. “Agentic AI: The age of reasoning—A re- view”.Journal of Automation and Intelligence, Aug., p. S2949855425000516
work page 2025
-
[32]
From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users
Chowa, S. S., Alvi, R., Rahman, S. S., Rahman, M. A., Raiaan, M. A. K., Islam, M. R., Hussain, M., and Azam, S., 2025. “From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users”. arXiv preprint arXiv:2508.17281
-
[33]
ReAct: Synergizing Reasoning and Acting in Language Models
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y ., 2022. “ReAct: Synergizing Reason- ing and Acting in Language Models”.arXiv preprint arXiv:2210.03629
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Wu, Q., Bansal, G., Zhang, J., Wu, Y ., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., Awadallah, A. H., White, R. W., Burger, D., and Wang, C., 2023. “AutoGen: En- abling Next-Gen LLM Applications via Multi-Agent Con- versation”.arXiv preprint arXiv:2308.08155
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Exploration of LLM Multi-Agent Application Implementation Based on Lang- Graph+CrewAI
Duan, Z., and Wang, J., 2024. “Exploration of LLM Multi-Agent Application Implementation Based on Lang- Graph+CrewAI”.arXiv preprint arXiv:2411.18241
-
[36]
Wang, J., and Duan, Z., 2024. “Agent AI with Lang- Graph: A Modular Framework for Enhancing Machine Translation Using Large Language Models”.arXiv preprint arXiv:2412.03801
- [37]
-
[38]
GraphAgents: Knowledge Graph-Guided Agentic AI for Cross-Domain Materials Design
“GraphAgents: Knowledge Graph-Guided Agentic AI for Cross-Domain Materials Design”.arXiv preprint arXiv:2602.07491
-
[39]
Robin: A multi-agent system for automating scientific discovery
Ghareeb, A. E., Chang, B., Mitchener, L., Yiu, A., Szostkiewicz, C. J., Laurent, J. M., Razzak, M. T., White, A. D., Hinks, M. M., and Rodriques, S. G., 2025. “Robin: A multi-agent system for automating scientific discovery”. arXiv preprint arXiv:2505.13400
-
[40]
AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design
Elrefaie, M., Qian, J., Wu, R., Chen, Q., Dai, A., and Ahmed, F., 2025. “AI Agents in Engineering Design: A Multi-Agent Framework for Aesthetic and Aerodynamic Car Design”. In V olume 3B: 51st Design Automation Conference (DAC), American Society of Mechanical En- gineers, p. V03BT03A048
work page 2025
-
[41]
Ni, B., and Buehler, M. J., 2023. “MechAgents: Large lan- guage model multi-agent collaborations can solve mechan- ics problems, generate new data, and integrate knowledge”. arXiv preprint arXiv:2311.08166
-
[42]
Agentic Large Lan- guage Models for Conceptual Systems Engineering and Design
Massoudi, S., and Fuge, M., 2025. “Agentic Large Lan- guage Models for Conceptual Systems Engineering and Design”. In V olume 3B: 51st Design Automation Confer- ence (DAC), American Society of Mechanical Engineers, p. V03BT03A045
work page 2025
-
[43]
Colvin, S., Jolibois, E., Ramezani, H., Badaracco, A. G., Dorsey, T., Montague, D., Matveenko, S., Trylesinski, M., Runkle, S., Hewitt, D., and Hall, A., 2024. Pydantic (version 2.9.0). GitHub repository,https://github. com/pydantic/pydantic. Accessed: 2026-05-18
work page 2024
- [44]
-
[45]
Bambu Lab, 2026.Bambu Studio (version 2.5.0). Computer software. Available athttps://bambulab. com/en-us/download/studio, accessed 2026-05- 18
work page 2026
-
[46]
Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., Ge, W., Fan, Y ., Dang, K., Du, M., Ren, X., Men, R., Liu, D., Zhou, C., Zhou, J., and Lin, J.,
-
[47]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
“Qwen2-VL: Enhancing Vision-Language Model’s 17 Copyright © by ASME Perception of the World at Any Resolution”.arXiv preprint arXiv:2409.12191
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
Self-Preference Bias in LLM-as-a-Judge
Wataoka, K., Takahashi, T., and Ri, R., 2024. “Self- Preference Bias in LLM-as-a-Judge”.arXiv preprint arXiv:2410.21819
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Zheng, L., Chiang, W.-L., Sheng, Y ., Zhuang, S., Wu, Z., Zhuang, Y ., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., and Stoica, I., 2023. “Judging LLM-as-a- Judge with MT-Bench and Chatbot Arena”.arXiv preprint arXiv:2306.05685
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[50]
“Gemma 3”.Kaggle Tech- nical Report
Gemma Team, 2025. “Gemma 3”.Kaggle Tech- nical Report. Available athttps://goo.gle/ Gemma3Report, accessed 2026-05-18
work page 2025
-
[51]
Kwon, W., Li, Z., Zhuang, S., Sheng, Y ., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I.,
-
[52]
Efficient Memory Management for Large Language Model Serving with PagedAttention
“Efficient Memory Management for Large Lan- guage Model Serving with PagedAttention”.arXiv preprint arXiv:2309.06180
work page internal anchor Pith review Pith/arXiv arXiv
-
[53]
Schroeder, W., Martin, K., and Lorensen, B., 2006.The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics. Kitware
work page 2006
-
[54]
K3d: Lightweight wrapper to run k3s in docker.https://k3d.io/stable/
K3d Project, n.d.. K3d: Lightweight wrapper to run k3s in docker.https://k3d.io/stable/. Accessed: 2026- 05-18. A AI Judge Agent Prompt The exact text supplied to the AI Judge for each layer is reproduced verbatim in Figure 8. B Vision Agent Prompt Vision Agent system prompt is reproduced in Figure 9, and the revision instructions injected at each turn ar...
work page 2026
-
[55]
for WebGL-based rendering, and Playwright for automated screenshot capture within an asynchronous Python workflow. Input data consist of nodal coordinates, element connectiv- ity, nodal forces, boundary constraints, and element density vari- ables from topology optimization. When density variables are provided, elements with densityρ<0.5 are excluded from...
work page 1920
-
[56]
Visually compare the structures
-
[57]
Identify which parameter changes correlated with higher scores
-
[58]
Identify which changes led to worse results
-
[59]
Build on what worked and avoid repeating what didn’t
-
[60]
Be specific about which parameters to change and by how much. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Follow these rules:
-
[61]
The design must remain functional as a phone stand where the phone lies along the diagonal surface
-
[62]
Do not go below a density filter radius of 1.5. FIGURE 10: Revision instructions injected into the vision agent’s context at each turn, governing how it ingests prior iterations and what hard constraints it must respect. context to the vision agent. E Post-Processing Geometry Rules Following topology optimization, the optimized density vol- umex Phys is m...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.