gyaradax: Local Gyrokinetics JAX Code

Eric Volkmann; Gianluca Galletti; Johannes Brandstetter

arxiv: 2604.06085 · v1 · submitted 2026-04-07 · ⚛️ physics.plasm-ph · physics.comp-ph

gyaradax: Local Gyrokinetics JAX Code

Gianluca Galletti , Eric Volkmann , Johannes Brandstetter This is my paper

Pith reviewed 2026-05-10 18:02 UTC · model grok-4.3

classification ⚛️ physics.plasm-ph physics.comp-ph

keywords gyrokineticsJAXGPU accelerationplasma turbulenceautomatic differentiationflux-tubefusion plasmasagentic workflows

0 comments

The pith

A JAX and CUDA reimplementation of local flux-tube gyrokinetics matches the GKW Fortran code on benchmarks while adding GPU acceleration and automatic differentiation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces gyaradax as a minimal JAX and CUDA solver for local flux-tube gyrokinetics based on the GKW codebase. It validates the new code against analytical cases and empirical benchmarks, demonstrating formal agreement and statistical parity with the original implementation along with a substantial speedup on GPUs. The work also describes the use of agentic workflows to translate and optimize the legacy Fortran code rapidly. This setup allows gyrokinetic models to integrate directly into machine learning pipelines, as illustrated by examples of inverse problems and sensitivity analysis in plasma physics.

Core claim

gyaradax is a JAX and CUDA solver for local flux-tube gyrokinetics that reproduces analytical solutions and benchmark results from GKW with formal agreement and statistical parity, delivers substantial GPU speedup, incorporates automatic differentiation for compatibility with optimization and ML workflows, and was built using structured agentic coding methods that enabled fast translation from complex Fortran.

What carries the argument

The JAX and CUDA reimplementation of the GKW flux-tube gyrokinetics solver equipped with automatic differentiation

If this is right

Gyrokinetic turbulence simulations become feasible at higher throughput on standard GPU hardware.
Automatic differentiation enables direct gradient computation for inverse problems and parameter optimization in fusion plasma studies.
Legacy Fortran plasma codes can be translated to modern differentiable frameworks using agentic workflows guided by unit testing.
Sensitivity analysis of plasma parameters can be performed more efficiently by leveraging built-in differentiation.
Research combining gyrokinetics with machine learning models gains a practical computational platform.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The speedup could support larger ensembles of simulations for statistical studies of turbulence statistics that were previously limited by compute time.
Similar JAX translations might be applied to other legacy codes in plasma physics to create a family of interoperable, differentiable tools.
Integration with neural network surrogates could produce hybrid models that accelerate nonlinear regime calculations while retaining physics fidelity.

Load-bearing premise

That the JAX and CUDA code faithfully reproduces the numerical methods and physics of the original GKW Fortran implementation across all relevant regimes without introducing undetected discrepancies.

What would settle it

A side-by-side run of gyaradax and GKW on one of the paper's validation benchmark cases that yields a statistically significant difference in a key output quantity such as turbulent heat flux or linear growth rate.

Figures

Figures reproduced from arXiv: 2604.06085 by Eric Volkmann, Gianluca Galletti, Johannes Brandstetter.

**Figure 1.** Figure 1: Nonlinear Poisson bracket. Left: GKW implementation with 4 inverse FFTs. Right: CUDA LTO callbacks fuse all preand post-FFT work, with Z2Z packing reducing the iFFTs to 2. complex (C2C) transform, with a Hermitian symmetrization correction at ky=0 to prevent channel leakage from gyroaveraging asymmetry (Appendix D.2). CUDA backend. For the linear RHS, GKW and the JAX backend evaluate the nine-point paral… view at source ↗

**Figure 2.** Figure 2: Empirical statistical validation of gyaradax against GKW for two adiabatic ITG equilibria. Top: heat and momentum flux time traces. Both codes coincide during linear growth, but once the nonlinear effects kick-in, trajectories diverge due to chaotic sensitivity. Three runs were performed for both gyaradax and GKW, with the variance across them reported as shaded bounds. Bottom: time-averaged ky and kx spec… view at source ↗

**Figure 5.** Figure 5: Cyclone Base Case (linear). (a) Growth rate vs kθρs (R/LT =6.9). (b) Growth rate vs R/LT (kθρs=0.5). 4.3. Experiments Inverse Problem. A direct payoff of end-to-end differentiability is gradient-based inverse problems. We demonstrate this by recovering the temperature gradient R/LT from a target electrostatic potential ϕ ∗ . We fix the equilibrium parameters (q=1.4, sˆ=0.78, ε=0.19, R/Ln=2.22) and use a … view at source ↗

**Figure 6.** Figure 6: Recovery of R/LT from the electrostatic potential. Left: initial, target (×), and recovered ky spectra (top), R/LT convergence during iterations (bottom). Right: loss landscape L(R/LT ) (top) and AD gradient (bottom), with the Adam trajectory overlaid. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Growth rate sensitivity to R/LT (ky=0 zonal mode is masked). Top: per-ky growth rate γ(ky, R/LT ). Bottom: sensitivity ∂γ/∂(R/LT ). confirming that the optimizer recovers both the instability drive and the eigenmode structure. The loss landscape (Figure 6, right), obtained by evaluating L and ∂L/∂(R/LT ) over R/LT ∈ [2, 12], is smooth and unimodal, with a clean zero crossing of the gradient at the optimum… view at source ↗

**Figure 8.** Figure 8: Overview of our memory optimization strategies. E. Comparison with GKW Reference Code While gyaradax aligns with the local flux-tube physics of the GKW framework, several advanced features of the Fortran reference code are identified as future enhancements. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Heat flux time traces for all adiabatic validation configurations. Each panel shows gyaradax (cyan) versus GKW (black markers) for a distinct combination of q, sˆ, R/LT , and R/Ln. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Time-averaged ky (top) and kx (bottom) spectral profiles for all adiabatic validation configurations. gyaradax (purple lines) versus GKW (black markers). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

Gyrokinetic simulations are essential for understanding and controlling turbulence in fusion plasmas, yet they are oftentimes implemented in legacy codebases, in many cases CPU-bound. These are both hard to maintain and especially incompatible with optimization and ML workflows. gyaradax is a minimal JAX/CUDA solver for local flux-tube gyrokinetics. We base our implementation on GKW (Peeters et al., 2009), but with added native GPU acceleration and automatic differentiation. We validate gyaradax against analytical cases and empirical benchmarks, achieving formal agreement and statistical parity with GKW alongside a substantial speedup. We deliberately and extensively utilized agentic workflows in this project. A key contribution is showing that coding agents, guided by human expertise, structured prompting, and measurable progress through unit testing enabled extremely fast translation of complex Fortran code, and further optimizations. Gyaradax facilitates research at the intersection of ML and plasma physics. We showcase this through practical examples in inverse problems and sensitivity analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

gyaradax ports GKW to JAX with GPU and autodiff, which is useful for ML-plasma work, but the validation numbers are missing so the parity claim stays unproven.

read the letter

The punchline is that this is a JAX/CUDA reimplementation of the local flux-tube gyrokinetic model from GKW, adding native autodiff and GPU acceleration on top of the standard physics. That setup directly supports sensitivity analysis and inverse problems, which is the practical win for fusion researchers who want to couple gyrokinetics to optimization loops without rewriting everything themselves. The agentic coding workflow they describe is a secondary but real detail: they show how structured prompting plus unit tests let them translate the Fortran core quickly, and that process itself might be worth noting for other groups facing legacy code.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces gyaradax, a minimal JAX/CUDA local flux-tube gyrokinetic solver ported from GKW (Peeters et al., 2009). It claims formal agreement and statistical parity with GKW on analytical cases and empirical benchmarks, substantial GPU speedup, native automatic differentiation, and applications to inverse problems and sensitivity analysis in plasma physics. The work also emphasizes the use of agentic AI workflows for rapid Fortran-to-JAX translation guided by unit testing.

Significance. If the reproduction claims hold with quantified verification, gyaradax would provide a differentiable, GPU-accelerated gyrokinetic tool that lowers barriers to ML integration in fusion turbulence research. The agentic workflow demonstration could inform code-porting practices in computational physics more broadly.

major comments (3)

[Abstract and §4 (Validation/Benchmarks)] Abstract and validation claims: The assertions of 'formal agreement and statistical parity' with GKW lack any reported quantitative metrics (e.g., relative errors in linear growth rates, nonlinear fluxes, or L2 norms), tolerance thresholds, error bars, or tables of direct comparisons. In gyrokinetics, even sub-percent systematic offsets can alter saturation levels, so this is load-bearing for the central validation claim.
[§4 (Validation/Benchmarks)] Benchmark coverage: No statement of the verified parameter space (k_y range, beta, collisionality, or other regimes), exclusion criteria for test cases, or coverage of key numerical components (finite-difference/spectral discretization, velocity quadrature, parallel boundaries, collision operator, time integrator) is provided. This leaves open the possibility of undetected discrepancies in untested regimes.
[§3 (Implementation)] Numerical fidelity: The manuscript does not explicitly confirm that all GKW-specific normalizations, floating-point conventions, and implementation details were reproduced exactly in the JAX port, which is required to support the 'drop-in' and 'parity' claims.

minor comments (3)

[§2 or §5] Expand the description of the agentic workflow methodology (prompting strategies, unit-test metrics, human oversight) in the main text rather than leaving it primarily in the abstract.
[Introduction] Add explicit references to the original GKW papers and any prior gyrokinetic JAX or differentiable codes for context.
[Figures] Ensure all comparison figures include error bars, axis labels with units, and captions that state the exact quantities plotted and the tolerance used for 'agreement'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and constructive major comments, which identify key areas where the validation and implementation sections require greater rigor and transparency. We respond to each point below and will incorporate the necessary clarifications and additions in the revised manuscript.

read point-by-point responses

Referee: [Abstract and §4 (Validation/Benchmarks)] Abstract and validation claims: The assertions of 'formal agreement and statistical parity' with GKW lack any reported quantitative metrics (e.g., relative errors in linear growth rates, nonlinear fluxes, or L2 norms), tolerance thresholds, error bars, or tables of direct comparisons. In gyrokinetics, even sub-percent systematic offsets can alter saturation levels, so this is load-bearing for the central validation claim.

Authors: We agree that explicit quantitative metrics are required to support the validation claims. In the revised manuscript we will add a comparison table in §4 (and reference it in the abstract) that reports relative errors for linear growth rates (typically <0.5% across the tested k_y) and time-averaged nonlinear fluxes, together with the precise tolerance thresholds used to define formal agreement (<1% difference) and statistical parity (overlap within 1σ from ensemble runs with varied initial perturbations). Error bars from multiple realizations will be shown for the nonlinear cases. revision: yes
Referee: [§4 (Validation/Benchmarks)] Benchmark coverage: No statement of the verified parameter space (k_y range, beta, collisionality, or other regimes), exclusion criteria for test cases, or coverage of key numerical components (finite-difference/spectral discretization, velocity quadrature, parallel boundaries, collision operator, time integrator) is provided. This leaves open the possibility of undetected discrepancies in untested regimes.

Authors: We will expand §4 with an explicit statement of the verified parameter space, including k_y ∈ [0.05, 2.0], β ∈ [0, 1], and collisionality from collisionless to moderate values. We will also list the specific benchmark cases drawn from the GKW literature, note that exclusion was limited to cases lacking published GKW reference data, and confirm that the chosen tests collectively exercise the finite-difference/spectral operators, velocity quadrature, parallel boundary conditions, collision operator, and time integrator. revision: yes
Referee: [§3 (Implementation)] Numerical fidelity: The manuscript does not explicitly confirm that all GKW-specific normalizations, floating-point conventions, and implementation details were reproduced exactly in the JAX port, which is required to support the 'drop-in' and 'parity' claims.

Authors: We will add a dedicated paragraph in §3 that reproduces the GKW normalizations (time to c_s/R, lengths to ρ_s, etc.) and states that the JAX implementation adopts identical discretization, boundary conditions, and operator ordering. On floating-point conventions we note that gyaradax defaults to 32-bit precision for GPU performance, matching GKW’s common usage; we will include a short verification that any rounding differences remain well below the tolerance thresholds reported in the new comparison table. revision: partial

Circularity Check

0 steps flagged

No circularity: software reimplementation and external benchmark validation

full rationale

The manuscript presents a JAX/CUDA port of the existing GKW gyrokinetic solver (Peeters et al. 2009) together with GPU acceleration, automatic differentiation, and validation against analytical test cases plus empirical benchmarks. No derivations, ansatzes, fitted parameters, or predictions are claimed; the central assertion is faithful numerical reproduction of an external reference code, supported by direct comparison rather than self-referential construction. The single external citation to the original GKW paper supplies the independent baseline and does not reduce to any input of the present work. Consequently the derivation chain is empty and the paper is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a software reimplementation of an established model; it introduces no new free parameters, invented entities, or ad-hoc axioms beyond the domain assumptions already present in GKW.

axioms (1)

domain assumption The local flux-tube gyrokinetic equations and numerical methods implemented in GKW (Peeters et al., 2009) are correct and complete for the regimes tested.
The implementation is explicitly based on GKW without re-deriving the underlying physics.

pith-pipeline@v0.9.0 · 5472 in / 1220 out tokens · 88441 ms · 2026-05-10T18:02:11.295897+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

vibe coding

doi: 10.1063/1.873896. Duston, T., Xin, S., Sun, Y ., Zan, D., Li, A., Xin, S., Shen, K., Chen, Y ., Sun, Q., Zhang, G., Liu, J., Zhou, H., Liu, J., Pu, Z., Wang, Y ., Ge, B.-X., Tong, X., Ye, F., Zhao, Z.-C., Han, W.-B., Cao, Z., Zhao, Y ., Ren, W., Long, Q., Liu, Y ., Huang, A., Du, Y ., Rong, Y ., and Peng, J. Ainsteinbench: Benchmarking coding agents ...

work page doi:10.1063/1.873896 2025
[2]

Accessed 2026-03-31

URL https://x.com/karpathy/status/ 1886192184808149383. Accessed 2026-03-31. 7 gyaradax: Local Gyrokinetics JAX Code Kelling, J., Bolea, V ., Bussmann, M., Checkervarty, A., Debus, A., Ebert, J., Eisenhauer, G., Gutta, V ., Kessel- heim, S., Klasky, S., Pandit, V ., Pausch, R., Podhorszki, N., Poschel, F., Rogers, D., Rustamov, J., Schmerler, S., Schramm,...

work page 2026
[3]

doi: https://doi.org/10.1146/ annurev-fluid-120710-101223

ISSN 1545-4479. doi: https://doi.org/10.1146/ annurev-fluid-120710-101223. McGreivy, N., Hudson, S., and Zhu, C. Optimized finite- build stellarator coils using automatic differentiation.Nu- clear Fusion, 61(2):026020, January 2021. ISSN 1741-

work page 2021
[4]

dzetadeps

doi: 10.1088/1741-4326/abcd76. URL http: //dx.doi.org/10.1088/1741-4326/abcd76. NVIDIA Corporation.cuFFT Library User’s Guide: Callback Routines, 2026. URL https: //docs.nvidia.com/cuda/archive/12.2. 1/cufft/ltoea/usage/api_usage.html. Accessed: 2026-03-30. OpenAI. Gpt-5.3-codex. https://openai.com/ index/introducing-gpt-5-3-codex/ , 2026. Accessed: 2026....

work page doi:10.1088/1741-4326/abcd76 2026
[5]

Each packed spectral element is read from HBM exactly once; the callback returns the fully processed value directly to cuFFT’s butterfly registers

Inverse FFT load callback:Fuses the scatter-to-dense-grid, spectral derivative multiplication ( ikx, iky), gyro- averaging (Bessel J0 multiplication), and amplitude scaling into a single register-only computation. Each packed spectral element is read from HBM exactly once; the callback returns the fully processed value directly to cuFFT’s butterfly registers

work page
[6]

Forward FFT load callback:Fuses the Poisson bracket computation. Instead of materializing four real-space gradient arrays and computing ∂yϕ ∂xf−∂ xϕ ∂yf in a separate kernel, the callback reads the gradient arrays and computes the bracket in-flight, returning the result to the forward FFT without an intermediate HBM round-trip

work page
[7]

Forward FFT store callback:Writes directly to the packed output spectrum, skipping the 59% of dealiased modes that would otherwise be written and immediately discarded. D.3. Two-for-One Spectral Packing To further reduce the cuFFT overhead, the reasoning model proposed exploiting the linearity of the discrete Fourier transform (DFT). If two signals A and ...

work page 1998
[8]

Y ou MUST fully load the following specific files into your context, and explore for any other useful ones: •gkw ref/src/gkw.f90(Main loop over large time steps, normalisation)

INITIAL CONTEXT INGESTION:Before starting any code translation or detailed planning, explore the gkw ref/src (Fortran code) and gkw ref/manual (LaTeX files) directories. Y ou MUST fully load the following specific files into your context, and explore for any other useful ones: •gkw ref/src/gkw.f90(Main loop over large time steps, normalisation). •gkw ref/...

work page
[9]

Proceed to this step ONL Y after all prior tests are passing.Stop and wait for user approval

IMPLEMENT LINEAR gksolve:Write the gksolve function for the LINEAR case, based on the notes you created (Terms I, II, IV, V, VII, VIII, Diffusion). Proceed to this step ONL Y after all prior tests are passing.Stop and wait for user approval

work page
[10]

Proceed to this step ONL Y after all prior tests are passing, ESPECIALL Y@test linear.py

IMPLEMENT NONLINEAR gksolve:Finalize gksolve for the NONLINEAR case by extending the LINEAR version with Term III. Proceed to this step ONL Y after all prior tests are passing, ESPECIALL Y@test linear.py. TO CONCLUDE, make sure that both @test linear.py and @test nonlinear.pyPASS.Report the final test results. 20

work page

[1] [1]

vibe coding

doi: 10.1063/1.873896. Duston, T., Xin, S., Sun, Y ., Zan, D., Li, A., Xin, S., Shen, K., Chen, Y ., Sun, Q., Zhang, G., Liu, J., Zhou, H., Liu, J., Pu, Z., Wang, Y ., Ge, B.-X., Tong, X., Ye, F., Zhao, Z.-C., Han, W.-B., Cao, Z., Zhao, Y ., Ren, W., Long, Q., Liu, Y ., Huang, A., Du, Y ., Rong, Y ., and Peng, J. Ainsteinbench: Benchmarking coding agents ...

work page doi:10.1063/1.873896 2025

[2] [2]

Accessed 2026-03-31

URL https://x.com/karpathy/status/ 1886192184808149383. Accessed 2026-03-31. 7 gyaradax: Local Gyrokinetics JAX Code Kelling, J., Bolea, V ., Bussmann, M., Checkervarty, A., Debus, A., Ebert, J., Eisenhauer, G., Gutta, V ., Kessel- heim, S., Klasky, S., Pandit, V ., Pausch, R., Podhorszki, N., Poschel, F., Rogers, D., Rustamov, J., Schmerler, S., Schramm,...

work page 2026

[3] [3]

doi: https://doi.org/10.1146/ annurev-fluid-120710-101223

ISSN 1545-4479. doi: https://doi.org/10.1146/ annurev-fluid-120710-101223. McGreivy, N., Hudson, S., and Zhu, C. Optimized finite- build stellarator coils using automatic differentiation.Nu- clear Fusion, 61(2):026020, January 2021. ISSN 1741-

work page 2021

[4] [4]

dzetadeps

doi: 10.1088/1741-4326/abcd76. URL http: //dx.doi.org/10.1088/1741-4326/abcd76. NVIDIA Corporation.cuFFT Library User’s Guide: Callback Routines, 2026. URL https: //docs.nvidia.com/cuda/archive/12.2. 1/cufft/ltoea/usage/api_usage.html. Accessed: 2026-03-30. OpenAI. Gpt-5.3-codex. https://openai.com/ index/introducing-gpt-5-3-codex/ , 2026. Accessed: 2026....

work page doi:10.1088/1741-4326/abcd76 2026

[5] [5]

Each packed spectral element is read from HBM exactly once; the callback returns the fully processed value directly to cuFFT’s butterfly registers

Inverse FFT load callback:Fuses the scatter-to-dense-grid, spectral derivative multiplication ( ikx, iky), gyro- averaging (Bessel J0 multiplication), and amplitude scaling into a single register-only computation. Each packed spectral element is read from HBM exactly once; the callback returns the fully processed value directly to cuFFT’s butterfly registers

work page

[6] [6]

Forward FFT load callback:Fuses the Poisson bracket computation. Instead of materializing four real-space gradient arrays and computing ∂yϕ ∂xf−∂ xϕ ∂yf in a separate kernel, the callback reads the gradient arrays and computes the bracket in-flight, returning the result to the forward FFT without an intermediate HBM round-trip

work page

[7] [7]

Forward FFT store callback:Writes directly to the packed output spectrum, skipping the 59% of dealiased modes that would otherwise be written and immediately discarded. D.3. Two-for-One Spectral Packing To further reduce the cuFFT overhead, the reasoning model proposed exploiting the linearity of the discrete Fourier transform (DFT). If two signals A and ...

work page 1998

[8] [8]

Y ou MUST fully load the following specific files into your context, and explore for any other useful ones: •gkw ref/src/gkw.f90(Main loop over large time steps, normalisation)

INITIAL CONTEXT INGESTION:Before starting any code translation or detailed planning, explore the gkw ref/src (Fortran code) and gkw ref/manual (LaTeX files) directories. Y ou MUST fully load the following specific files into your context, and explore for any other useful ones: •gkw ref/src/gkw.f90(Main loop over large time steps, normalisation). •gkw ref/...

work page

[9] [9]

Proceed to this step ONL Y after all prior tests are passing.Stop and wait for user approval

IMPLEMENT LINEAR gksolve:Write the gksolve function for the LINEAR case, based on the notes you created (Terms I, II, IV, V, VII, VIII, Diffusion). Proceed to this step ONL Y after all prior tests are passing.Stop and wait for user approval

work page

[10] [10]

Proceed to this step ONL Y after all prior tests are passing, ESPECIALL Y@test linear.py

IMPLEMENT NONLINEAR gksolve:Finalize gksolve for the NONLINEAR case by extending the LINEAR version with Term III. Proceed to this step ONL Y after all prior tests are passing, ESPECIALL Y@test linear.py. TO CONCLUDE, make sure that both @test linear.py and @test nonlinear.pyPASS.Report the final test results. 20

work page