LLM4CAD-Editor: An Intent-Aware Large Language Model Framework for Multi-Level Computer-Aided Design Editing

Yuewan Sun; Zhenghui Sha

arxiv: 2606.20607 · v1 · pith:WMVIAJ72new · submitted 2026-05-21 · 💻 cs.HC

LLM4CAD-Editor: An Intent-Aware Large Language Model Framework for Multi-Level Computer-Aided Design Editing

Yuewan Sun , Zhenghui Sha This is my paper

Pith reviewed 2026-06-30 15:30 UTC · model grok-4.3

classification 💻 cs.HC

keywords computer-aided designlarge language modelsdomain-specific languageCAD editingparametric modelingintent-aware systemsmultimodal datasets

0 comments

The pith

An LLM framework with a symbolic DSL enables reliable multi-level editing of CAD models from natural language instructions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops LLM4CAD-Editor to support iterative CAD design through instruction-guided edits rather than one-shot generation. It creates a domain-specific language that represents CAD features symbolically so language models can select and modify them by name. A dataset of over 35,000 instruction-program pairs is built to train and test the system across parameter, operation, and functional edit types. Fine-tuning a 32B model yields high accuracy on low-level changes and good intent satisfaction on high-level ones, with better robustness than direct Python scripting. This approach aims to make AI tools more useful in actual engineering workflows that involve repeated modifications.

Core claim

LLM4CAD-Editor, based on LLM4CAD-DSL, transforms CAD editing into natural language reasoning by using feature names for entity selection, allowing LLMs to handle low-level parameter modifications and high-level functional edits with high accuracy and low structural errors.

What carries the argument

LLM4CAD-DSL, a structured domain-specific language with a feature-level entity selection mechanism that lets models reference geometry by feature names instead of coordinates.

If this is right

Parameter-level edits achieve 96.3% parsing accuracy and 0.935 average IoU.
Functional-level edits reach 82% intent satisfaction with 0.708 average IoU.
The system shows 1.4 times better editing robustness than Python-based CAD scripting.
Average editing distances stay low across parameter, operation, and functional levels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The DSL approach could extend to other parametric modeling tools if they expose named features.
Pairing the text-based editor with image inputs might support visual feedback loops for iterative changes.
Larger models or more diverse training data may close the remaining gap in complex functional edits.

Load-bearing premise

That LLMs can reliably reason about geometry when references use feature names instead of coordinate values.

What would settle it

Running the fine-tuned model on a held-out set of editing tasks that involve geometry without assigned feature names or require explicit coordinate arithmetic, and measuring if accuracy drops significantly.

read the original abstract

Large language models (LLMs) have recently enabled automatic generation of parametric computer-aided design (CAD) programs from natural language. However, real-world CAD workflows are inherently iterative and require reliable editing rather than one-shot model synthesis. In this work, we propose LLM4CAD-Editor, an LLM-based intent-aware framework for instruction-guided CAD editing based on a structured domain-specific language (LLM4CAD-DSL). The symbolic representation of LLM4CAD-DSL enables robust geometric modification through a feature-level entity selection mechanism, allowing models to reference geometry via feature names instead of coordinates, thus transforming fragile coordinate-based reasoning into natural language-based reasoning that many LLMs can handle. We construct a multimodal CAD editing dataset with over 35,139 instruction-program pairs via DSL-based augmentation and vision-language instruction synthesis, covering functional-, operation-, and parameter-level editing intents. To validate the work, we fine-tuned a 32B-parameter language model for DSL editing generation. Experimental results show high parsing accuracy for parameter-level edits (96.3%) and strong intent satisfaction rates of 82% for functional instructions. The model also achieves an average Intersection-over-Union (IoU) of 0.935 for parameter-level edits, 0.871 for operation-level edits, and 0.708 for functional-level edits, while the corresponding average editing distances are 0.176, 0.579, and 2.859, respectively. Comparative studies further demonstrate a significant improvement in editing robustness by 1.4x over Python-based CAD scripting approaches. These results confirm that LLM4CAD-Editor can reliably perform both low-level parameter modifications and high-level functional edits, maintaining high accuracy and low structural errors across diverse editing tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a feature-name DSL and a 35k editing dataset to let LLMs handle CAD changes at three intent levels, with clear gains on parameter edits but a noticeable drop on functional ones.

read the letter

The main takeaway is that LLM4CAD-Editor introduces a DSL for naming CAD features so LLMs can edit models without coordinate fragility, plus a large dataset of instruction-program pairs across functional, operation, and parameter intents.

What stands out as new is the focus on editing rather than one-shot generation, the three-tier intent taxonomy, and the DSL-augmented dataset construction. Prior work the abstract cites mostly targets synthesis, so shifting to iterative edits with symbolic references is a practical step.

The paper does well on the low-level results. The 32B model reaches 96.3% parsing accuracy and 0.935 IoU on parameter edits, plus a 1.4x robustness improvement over Python scripting. The dataset size and the feature-name mechanism are concrete and reproducible elements.

The soft spot is the performance gap on functional edits: 82% intent satisfaction, 0.708 IoU, and 2.859 editing distance versus the much stronger parameter numbers. The abstract still describes the system as maintaining high accuracy and low structural errors across tasks, but the numbers show the DSL helps less when intent understanding is required. Lack of split details, baseline code, or a human-expert reference makes it harder to judge how meaningful the functional results are.

This is for HCI and CAD researchers working on LLM interfaces for design tools. Readers building editing systems would get value from the dataset and the DSL approach.

It deserves peer review because the method and empirical setup are specific enough to benefit from external checks, even if some claims need tightening around the higher-level results.

Referee Report

2 major / 1 minor

Summary. The paper introduces LLM4CAD-Editor, an LLM-based framework for instruction-guided multi-level CAD editing (parameter, operation, functional) that relies on a custom symbolic DSL (LLM4CAD-DSL) enabling feature-name entity selection rather than coordinate references. It describes construction of a 35,139-pair multimodal dataset via DSL augmentation and vision-language synthesis, fine-tuning of a 32B model, and reports 96.3% parsing accuracy (parameter), 82% intent satisfaction (functional), IoU values of 0.935/0.871/0.708 and editing distances of 0.176/0.579/2.859 across the three levels, plus a 1.4x robustness gain versus Python-based scripting.

Significance. If the empirical claims hold after clarification, the work would be significant for the HCI/CAD community by demonstrating a practical path to intent-aware iterative editing that reduces reliance on fragile coordinate reasoning. The scale of the constructed dataset and the explicit multi-level coverage constitute a concrete resource that could enable follow-on studies; the comparative robustness result, if detailed, would strengthen the case for DSL-mediated approaches over direct scripting.

major comments (2)

[Abstract] Abstract: the central claim that the system 'can reliably perform both low-level parameter modifications and high-level functional edits, maintaining high accuracy and low structural errors across diverse editing tasks' is undermined by the reported functional-level metrics (IoU 0.708, distance 2.859) being substantially weaker than parameter-level (0.935, 0.176) without any stated success threshold, variance, or human-expert baseline to justify the adjectives 'high' and 'low'.
[Abstract] Abstract (experimental results paragraph): the 1.4x robustness improvement over Python-based CAD scripting is presented as a key comparative result, yet no description is given of the baseline implementations, the exact robustness metric, or the statistical test used, making it impossible to assess whether the gain is load-bearing for the framework's advantage.

minor comments (1)

[Abstract] Abstract: the dataset size is given as 'over 35,139' while the title uses '35k-pair'; a single consistent figure would improve precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting issues in the abstract that affect the clarity and support of our claims. We will revise the abstract to use more precise language tied directly to the reported metrics and to provide brief context for the comparative result.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the system 'can reliably perform both low-level parameter modifications and high-level functional edits, maintaining high accuracy and low structural errors across diverse editing tasks' is undermined by the reported functional-level metrics (IoU 0.708, distance 2.859) being substantially weaker than parameter-level (0.935, 0.176) without any stated success threshold, variance, or human-expert baseline to justify the adjectives 'high' and 'low'.

Authors: We agree that the qualitative descriptors 'high accuracy' and 'low structural errors' are not fully justified by the variation across levels and lack of explicit thresholds or external baselines. In the revised abstract we will replace the general claim with direct references to the per-level IoU and editing-distance values (0.935/0.176, 0.871/0.579, 0.708/2.859) so readers can assess performance themselves. We do not have variance statistics or a human-expert baseline in the current study; the multi-level results serve as the internal comparison. revision: yes
Referee: [Abstract] Abstract (experimental results paragraph): the 1.4x robustness improvement over Python-based CAD scripting is presented as a key comparative result, yet no description is given of the baseline implementations, the exact robustness metric, or the statistical test used, making it impossible to assess whether the gain is load-bearing for the framework's advantage.

Authors: We acknowledge that the abstract presents the 1.4x figure without the supporting details that appear in the experimental section of the full paper. We will revise the abstract either to qualify the claim (e.g., '1.4x improvement in robustness under the perturbation protocol described in Section 4') or to remove the numeric claim if space constraints prevent adequate context, ensuring the abstract does not assert an unsubstantiated advantage. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical evaluation chain

full rationale

The paper describes construction of a multimodal CAD editing dataset via DSL-based augmentation and vision-language synthesis, followed by fine-tuning a 32B model and reporting empirical metrics (parsing accuracy, IoU, editing distance, intent satisfaction) on held-out test cases. No equations, first-principles derivations, or predictions are presented that reduce by construction to fitted inputs or self-citations. The central claims rest on independent dataset creation and standard fine-tuning/evaluation procedures rather than any self-referential loop. This is a standard empirical ML systems paper with no load-bearing theoretical steps that could exhibit circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that LLMs can reliably map natural-language feature references to geometric operations once names replace coordinates, plus the modeling choice that automatic DSL augmentation produces representative editing examples.

axioms (1)

domain assumption Feature names in the DSL remain stable across edits and provide sufficient disambiguation for LLM reasoning.
Invoked in the description of the entity selection mechanism.

invented entities (1)

LLM4CAD-DSL no independent evidence
purpose: Structured symbolic representation that replaces coordinate-based references with named features.
New language introduced to enable the editing framework.

pith-pipeline@v0.9.1-grok · 5855 in / 1374 out tokens · 37051 ms · 2026-06-30T15:30:09.961328+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Llm4cad: Multimodal large language models for three-dimensional computer- aided design generation

Li, X., Sun, Y ., and Sha, Z., 2024. “Llm4cad: Multimodal large language models for three-dimensional computer- aided design generation”.Journal of Computing and In- formation Science in Engineering,25(2), 12, p. 021005

2024
[2]

K., Pu, Y ., Willis, K., and Liu, B., 2024

Wu, S., Khasahmadi, A., Katz, M., Jayaraman, P. K., Pu, Y ., Willis, K., and Liu, B., 2024. Cadvlm: Bridging language and vision in the generation of parametric cad sketches

2024
[3]

Cadia- logue: A multimodal llm-powered conversational assistant for intuitive parametric cad modeling

Zhou, J., Camba, J. D., and Company, P., 2026. “Cadia- logue: A multimodal llm-powered conversational assistant for intuitive parametric cad modeling”.Computer-Aided Design,191, p. 104006

2026
[4]

S., Li, C., and Mitra, N

Liu, Y ., Dutt, N. S., Li, C., and Mitra, N. J., 2025. B-repler: Language-guided editing of cad models

2025
[5]

Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing

Yuan, Y ., Sun, S., Liu, Q., and Bian, J., 2025. Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing

2025
[6]

Llm4cad-dsl: An llm-friendly domain- specific language for computer-aided design generation

Sun, Y ., and Sha, Z. Llm4cad-dsl: An llm-friendly domain- specific language for computer-aided design generation. Under review at Computer-Aided Design Journal, 2026

2026
[7]

Qwen3-VL Technical Report

Bai, S., Cai, Y ., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., Ge, W., Guo, Z., Huang, Q., Huang, J., Huang, F., Hui, B., Jiang, S., Li, Z., Li, M., Li, M., Li, K., Lin, Z., Lin, J., Liu, X., Liu, J., Liu, C., Liu, Y ., Liu, D., Liu, S., Lu, D., Luo, R., Lv, C., Men, R., Meng, L., Ren, X., Ren, X., Song, S., Sun, Y ., Tan...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Qwen3 technical report

Team, Q., 2025. Qwen3 technical report

2025
[9]

J., Shen, Y ., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., and Chen, W., 2021

Hu, E. J., Shen, Y ., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., and Chen, W., 2021. Lora: Low-rank adapta- tion of large language models

2021
[10]

Cadgpt: Harnessing natural language processing for 3d modelling to enhance computer-aided de- sign workflows

Kapsalis, T., 2024. Cadgpt: Harnessing natural language processing for 3d modelling to enhance computer-aided de- sign workflows

2024
[11]

Xu, J., Wang, C., Zhao, Z., Liu, W., Ma, Y ., and Gao, S.,
[12]

Cad-mllm: Unifying multimodality-conditioned cad generation with mllm
[13]

Li, J., Ma, W., Li, X., Lou, Y ., Zhou, G., and Zhou, X.,
[14]

Cad-llama: Leveraging large language models for computer-aided design parametric 3d model generation
[15]

Cadquery

CadQuery contributors, 2026. Cadquery

2026
[16]

Riegel, J., Mayer, W., and van Havre, Y ., 2024. Freecad. https://www.freecad.org/. Version 0.20.3

2024
[17]

Llm4cad-dsl: A dataset and dsl for llm-based cad editing.https://github.com/ YuewanSun/LLM4CAD-DSL

Sun, Y ., and Sha, Z., 2026. Llm4cad-dsl: A dataset and dsl for llm-based cad editing.https://github.com/ YuewanSun/LLM4CAD-DSL. Accessed: 2026-03-16

2026
[18]

A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning

Fan, R., He, F., Liu, Y ., Song, Y ., Fan, L., and Yan, X., 2025. “A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning”.INTEGRATED COMPUTER-AIDED EN- GINEERING,32(1), pp. 73–94

2025
[19]

G., 2015

Larkin, K. G., 2015. Structural similarity index ssimpli- fied: Is there really a simpler concept at the heart of image quality measurement?

2015
[20]

A guided tour to approximate string matching

Navarro, G., 2001. “A guided tour to approximate string matching”.ACM Comput. Surv.,33(1), Mar., p. 31–88

2001
[21]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., et al., 2024. “The llama 3 herd of models”.arXiv preprint arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Clamber: A benchmark of identifying and clarifying ambiguous infor- mation needs in large language models

Zhang, T., Qin, P., Deng, Y ., Huang, C., Lei, W., Liu, J., Jin, D., Liang, H., and Chua, T.-S., 2024. “Clamber: A benchmark of identifying and clarifying ambiguous infor- mation needs in large language models”. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

2024
[23]

Ambignlg: Addressing task ambiguity in instruction for nlg

Niwa, A., and Iso, H., 2024. Ambignlg: Addressing task ambiguity in instruction for nlg

2024
[24]

C., Alam, M

Doris, A. C., Alam, M. F., Nobari, A. H., and Ahmed, F.,
[25]

Cad-coder: An open-source vision-language model for computer-aided design code generation
[26]

Locobench: A benchmark for long- context large language models in complex software engi- neering

Qiu, J., et al., 2025. “Locobench: A benchmark for long- context large language models in complex software engi- neering”.arXiv preprint arXiv:2509.09614. Appendix A Qualitative Assessment of Editing Task A.1 Parameter-Level Results Figure 17 presents qualitative examples of parameter-level editing tasks. The subfigures illustrate several representative m...

work page arXiv 2025
[27]

Extrude a small circular profile 10mm from the top center to create a boss
[28]

Create a rectangular pocket on the front face with a depth of 5mm
[29]

Revolve a triangular sketch around the central vertical axis to add a conical top
[30]

Apply a 3mm fillet to the intersection edge between the cylinder and the base
[31]

Add a 45-degree chamfer to all four vertical edges of the main block
[32]

Extrude a circle through the entire body to create a clear passage
[33]

Use the revolve tool to cut a semi-circular groove around the outer surface
[34]

Create a hexagonal pocket on the side face that stops halfway through the part
[35]

Fillet all sharp external corners of the model to a radius of 2mm
[36]

Use a pocket operation with a draft angle to create a sloping interior cavity
[37]

Add a feature on the top surface that can serve as a mounting pillar for a PCB
[38]

Remove material from the center of the part to make it lighter while keeping the frame
[39]

Round off all sharp edges of the model so it is safe for a user to grip
[40]

Create an opening at each corner of the base to allow for M6 bolt installation
[41]

Add a smooth transition at the base joint to reduce stress concentration
[42]

Add a cylindrical support to the bottom to increase the part’s height
[43]

Create a semi-circular groove on the side for a finger to rest in
[44]

Round the inner edge of the top hole to make it easier to insert a pin
[45]

Hollow out the block from the top face to reduce material usage
[46]

Create a flat, recessed area on the bottom so the part sits stable. 17

[1] [1]

Llm4cad: Multimodal large language models for three-dimensional computer- aided design generation

Li, X., Sun, Y ., and Sha, Z., 2024. “Llm4cad: Multimodal large language models for three-dimensional computer- aided design generation”.Journal of Computing and In- formation Science in Engineering,25(2), 12, p. 021005

2024

[2] [2]

K., Pu, Y ., Willis, K., and Liu, B., 2024

Wu, S., Khasahmadi, A., Katz, M., Jayaraman, P. K., Pu, Y ., Willis, K., and Liu, B., 2024. Cadvlm: Bridging language and vision in the generation of parametric cad sketches

2024

[3] [3]

Cadia- logue: A multimodal llm-powered conversational assistant for intuitive parametric cad modeling

Zhou, J., Camba, J. D., and Company, P., 2026. “Cadia- logue: A multimodal llm-powered conversational assistant for intuitive parametric cad modeling”.Computer-Aided Design,191, p. 104006

2026

[4] [4]

S., Li, C., and Mitra, N

Liu, Y ., Dutt, N. S., Li, C., and Mitra, N. J., 2025. B-repler: Language-guided editing of cad models

2025

[5] [5]

Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing

Yuan, Y ., Sun, S., Liu, Q., and Bian, J., 2025. Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing

2025

[6] [6]

Llm4cad-dsl: An llm-friendly domain- specific language for computer-aided design generation

Sun, Y ., and Sha, Z. Llm4cad-dsl: An llm-friendly domain- specific language for computer-aided design generation. Under review at Computer-Aided Design Journal, 2026

2026

[7] [7]

Qwen3-VL Technical Report

Bai, S., Cai, Y ., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., Ge, W., Guo, Z., Huang, Q., Huang, J., Huang, F., Hui, B., Jiang, S., Li, Z., Li, M., Li, M., Li, K., Lin, Z., Lin, J., Liu, X., Liu, J., Liu, C., Liu, Y ., Liu, D., Liu, S., Lu, D., Luo, R., Lv, C., Men, R., Meng, L., Ren, X., Ren, X., Song, S., Sun, Y ., Tan...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Qwen3 technical report

Team, Q., 2025. Qwen3 technical report

2025

[9] [9]

J., Shen, Y ., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., and Chen, W., 2021

Hu, E. J., Shen, Y ., Wallis, P., Allen-Zhu, Z., Li, Y ., Wang, S., Wang, L., and Chen, W., 2021. Lora: Low-rank adapta- tion of large language models

2021

[10] [10]

Cadgpt: Harnessing natural language processing for 3d modelling to enhance computer-aided de- sign workflows

Kapsalis, T., 2024. Cadgpt: Harnessing natural language processing for 3d modelling to enhance computer-aided de- sign workflows

2024

[11] [11]

Xu, J., Wang, C., Zhao, Z., Liu, W., Ma, Y ., and Gao, S.,

[12] [12]

Cad-mllm: Unifying multimodality-conditioned cad generation with mllm

[13] [13]

Li, J., Ma, W., Li, X., Lou, Y ., Zhou, G., and Zhou, X.,

[14] [14]

Cad-llama: Leveraging large language models for computer-aided design parametric 3d model generation

[15] [15]

Cadquery

CadQuery contributors, 2026. Cadquery

2026

[16] [16]

Riegel, J., Mayer, W., and van Havre, Y ., 2024. Freecad. https://www.freecad.org/. Version 0.20.3

2024

[17] [17]

Llm4cad-dsl: A dataset and dsl for llm-based cad editing.https://github.com/ YuewanSun/LLM4CAD-DSL

Sun, Y ., and Sha, Z., 2026. Llm4cad-dsl: A dataset and dsl for llm-based cad editing.https://github.com/ YuewanSun/LLM4CAD-DSL. Accessed: 2026-03-16

2026

[18] [18]

A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning

Fan, R., He, F., Liu, Y ., Song, Y ., Fan, L., and Yan, X., 2025. “A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning”.INTEGRATED COMPUTER-AIDED EN- GINEERING,32(1), pp. 73–94

2025

[19] [19]

G., 2015

Larkin, K. G., 2015. Structural similarity index ssimpli- fied: Is there really a simpler concept at the heart of image quality measurement?

2015

[20] [20]

A guided tour to approximate string matching

Navarro, G., 2001. “A guided tour to approximate string matching”.ACM Comput. Surv.,33(1), Mar., p. 31–88

2001

[21] [21]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., et al., 2024. “The llama 3 herd of models”.arXiv preprint arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Clamber: A benchmark of identifying and clarifying ambiguous infor- mation needs in large language models

Zhang, T., Qin, P., Deng, Y ., Huang, C., Lei, W., Liu, J., Jin, D., Liang, H., and Chua, T.-S., 2024. “Clamber: A benchmark of identifying and clarifying ambiguous infor- mation needs in large language models”. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

2024

[23] [23]

Ambignlg: Addressing task ambiguity in instruction for nlg

Niwa, A., and Iso, H., 2024. Ambignlg: Addressing task ambiguity in instruction for nlg

2024

[24] [24]

C., Alam, M

Doris, A. C., Alam, M. F., Nobari, A. H., and Ahmed, F.,

[25] [25]

Cad-coder: An open-source vision-language model for computer-aided design code generation

[26] [26]

Locobench: A benchmark for long- context large language models in complex software engi- neering

Qiu, J., et al., 2025. “Locobench: A benchmark for long- context large language models in complex software engi- neering”.arXiv preprint arXiv:2509.09614. Appendix A Qualitative Assessment of Editing Task A.1 Parameter-Level Results Figure 17 presents qualitative examples of parameter-level editing tasks. The subfigures illustrate several representative m...

work page arXiv 2025

[27] [27]

Extrude a small circular profile 10mm from the top center to create a boss

[28] [28]

Create a rectangular pocket on the front face with a depth of 5mm

[29] [29]

Revolve a triangular sketch around the central vertical axis to add a conical top

[30] [30]

Apply a 3mm fillet to the intersection edge between the cylinder and the base

[31] [31]

Add a 45-degree chamfer to all four vertical edges of the main block

[32] [32]

Extrude a circle through the entire body to create a clear passage

[33] [33]

Use the revolve tool to cut a semi-circular groove around the outer surface

[34] [34]

Create a hexagonal pocket on the side face that stops halfway through the part

[35] [35]

Fillet all sharp external corners of the model to a radius of 2mm

[36] [36]

Use a pocket operation with a draft angle to create a sloping interior cavity

[37] [37]

Add a feature on the top surface that can serve as a mounting pillar for a PCB

[38] [38]

Remove material from the center of the part to make it lighter while keeping the frame

[39] [39]

Round off all sharp edges of the model so it is safe for a user to grip

[40] [40]

Create an opening at each corner of the base to allow for M6 bolt installation

[41] [41]

Add a smooth transition at the base joint to reduce stress concentration

[42] [42]

Add a cylindrical support to the bottom to increase the part’s height

[43] [43]

Create a semi-circular groove on the side for a finger to rest in

[44] [44]

Round the inner edge of the top hole to make it easier to insert a pin

[45] [45]

Hollow out the block from the top face to reduce material usage

[46] [46]

Create a flat, recessed area on the bottom so the part sits stable. 17