Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Hartmut Kaiser; Noujoud Nader; Patrick Diehl; Steve Brandt

arxiv: 2405.13101 · v2 · pith:LTCQKPU7new · submitted 2024-05-21 · 💻 cs.SE · cs.AI

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Patrick Diehl , Noujoud Nader , Steve Brandt , Hartmut Kaiser This is my paper

classification 💻 cs.SE cs.AI

keywords codeschatgptcodegenerategeneratinglanguagesparallelsimple

0 comments

read the original abstract

This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. Our objective is to assess the effectiveness of these AI models for generating scientific programs. To this end, we asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes. While both versions of ChatGPT successfully created codes that compiled and ran (with some help), some languages were easier for the AI to use than others (possibly because of the size of the training sets used). Parallel codes -- even the simple example we chose to study here -- also difficult for the AI to generate correctly.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Can LLMs Find Bugs in Code? An Evaluation from Beginner Errors to Security Vulnerabilities in Python and C++
cs.SE 2025-08 unverdicted novelty 4.0

LLMs perform well on basic syntactic and semantic bugs in small code but struggle with complex security vulnerabilities and large production codebases.