Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Arash Torabi Goodarzi; Dmitry Ignatov; Radu Timofte; Roman Kochnev; Zofia Antonina Bentyn

arxiv: 2504.06006 · v4 · pith:TQWODLWEnew · submitted 2025-04-08 · 💻 cs.LG · cs.AI· cs.NE

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?

Roman Kochnev , Arash Torabi Goodarzi , Zofia Antonina Bentyn , Dmitry Ignatov , Radu Timofte This is my paper

classification 💻 cs.LG cs.AIcs.NE

keywords hyperparameteroptimizationarchitecturescodeimagellamallmsmanipulation

0 comments

read the original abstract

Optimal hyperparameter selection is critical for maximizing the performance of neural networks in computer vision, particularly as architectures become more complex. This work explores the use of large language models (LLMs) for hyperparameter optimization by fine-tuning a parameter-efficient version of Code Llama using LoRA. The resulting model produces accurate and computationally efficient hyperparameter recommendations across a wide range of vision architectures. Unlike traditional methods such as Optuna, which rely on resource-intensive trial-and-error procedures, our approach achieves competitive or superior Root Mean Square Error (RMSE) while substantially reducing computational overhead. Importantly, the models evaluated span image-centric tasks such as classification, detection, and segmentation, fundamental components in many image manipulation pipelines including enhancement, restoration, and style transfer. Our results demonstrate that LLM-based optimization not only rivals established Bayesian methods like Tree-structured Parzen Estimators (TPE), but also accelerates tuning for real-world applications requiring perceptual quality and low-latency processing. All generated configurations are publicly available in the LEMUR Neural Network Dataset (https://github.com/ABrain-One/nn-dataset), which serves as an open source benchmark for hyperparameter optimization research and provides a practical resource to improve training efficiency in image manipulation systems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Automating Database-Native Function Code Synthesis with LLMs
cs.DB 2026-04 conditional novelty 7.0

DBCooker automates synthesis of database native functions via LLM-guided characterization, coding plans, hybrid filling, and progressive validation, delivering 34.55% higher accuracy than baselines on SQLite, PostgreS...