pith. sign in

arxiv: 2605.21334 · v1 · pith:D2HADSZ5new · submitted 2026-05-20 · 💻 cs.SE · cs.CE· physics.comp-ph

RSE of a Quantum Transport Code and its Effects

Pith reviewed 2026-05-21 03:16 UTC · model grok-4.3

classification 💻 cs.SE cs.CEphysics.comp-ph
keywords research software engineeringFortranquantum transportcode qualitycontinuous integrationbenchmarkingscientific softwarememory defects
0
0 comments X

The pith

Applying RSE practices to libNEGF exposed defects common to Fortran scientific codes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports on two years of research software engineering work on libNEGF, a Fortran code for quantum transport simulations. Practices including continuous integration, automated testing, and continuous benchmarking were applied systematically. These efforts identified serious defects such as uninitialized memory reads, out-of-bounds writes, and a misunderstanding in the mathematical model for boundary conditions. Performance regressions linked to changes in high-performance computing setups were also found. The authors use these experiences to argue that dangerous defects, similar to undefined behavior in other languages, are prevalent in Fortran-based scientific software, and that the described practices can be adopted more widely.

Core claim

The systematic use of continuous integration, automated testing, compiler warning correction, and continuous benchmarking on libNEGF over two years revealed critical defects including uninitialized memory reads, out-of-bounds writes, a misunderstood mathematical model in boundary condition handling, and performance regressions from HPC system changes. This provides data points indicating that a class of dangerous defects equivalent to undefined behavior is as prevalent in Fortran scientific codes as in other contexts.

What carries the argument

Continuous integration, automated testing, and continuous benchmarking applied to the libNEGF quantum transport code, which served to uncover hidden defects and performance issues.

If this is right

  • Other Fortran scientific codes can be improved by implementing similar continuous testing and benchmarking regimes to catch memory and model errors early.
  • Performance stability in scientific simulations can be maintained by monitoring for regressions caused by system configuration changes.
  • The practices described can be selectively adopted in both new and existing scientific software projects regardless of the programming language used.
  • Addressing misunderstood mathematical models through code review and testing enhances the correctness of simulation results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If such defects are widespread, then many published scientific results from Fortran codes may have subtle platform-dependent inaccuracies that go undetected without these checks.
  • Extending the testing approach to codes in other domains like climate modeling or computational biology could test the generalizability of the findings.
  • Integrating domain expert review of mathematical models alongside automated tests might prevent implementation errors at the source.

Load-bearing premise

That the defects and issues discovered in libNEGF are representative of those in the wider class of Fortran scientific codes.

What would settle it

A comparable study applying the same RSE practices to several other independent Fortran scientific codes that finds markedly fewer instances of uninitialized memory reads and out-of-bounds writes would challenge the claim of prevalence.

Figures

Figures reproduced from arXiv: 2605.21334 by Christoph Conrads, Edoardo Di Napoli.

Figure 1
Figure 1. Figure 1: Parts of a git history of HPC software. The oldest commits are at the top and the arrows [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Various git histories. Circles indicate commits with descendants on the right. X’ and [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The CI configuration of libNEGF. The file was edited to fit onto one page: the C, C++ [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The Dockerfile for Debian containers (the file was slightly edited to fit onto one page). [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The CI configuration of libnegf-benchmarks. The command in line 11 is a superficial [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The GPU activity of a job on JUWELS Booster throughout its lifetime. The abscissa [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: This plot shows the outcome of the strong scaling runs of the libNEGF CB setup. After [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
read the original abstract

This paper presents our research software engineering (RSE) experiences over two years with libNEGF, a quantum transport code. We describe practical approaches to code quality assurance--including continuous integration, automated testing, and compiler warning correction--and performance engineering through continuous benchmarking. Our systematic application of these practices revealed critical defects: uninitialized memory reads, out-of-bounds writes, and notably, a misunderstood mathematical model in our boundary condition handling. We also document how continuous benchmarking exposed performance regressions caused by HPC system configuration changes. Our findings provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere. While libNEGF is implemented in Fortran, most recommendations are applicable to scientific software regardless of implementation language, and they can be implemented selectively or in their entirety for both new and existing projects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript reports on two years of research software engineering (RSE) practices applied to libNEGF, a Fortran-based quantum transport code. It details the implementation of continuous integration, automated testing, compiler warning corrections, and continuous benchmarking, which uncovered defects such as uninitialized memory reads, out-of-bounds writes, and a misunderstood boundary condition model, along with performance regressions from HPC system changes. The authors conclude that these findings suggest dangerous defects are as prevalent in Fortran scientific codes as undefined behavior in C/C++.

Significance. The paper contributes practical insights into applying RSE techniques to scientific software, which could aid developers in improving code reliability and performance. The direct experience with defect detection and performance engineering is a strength, but the broader claim about prevalence across Fortran codes requires additional comparative evidence to hold significant weight.

major comments (1)
  1. [Abstract] The statement that the findings 'provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere' is not backed by comparative data or analysis from multiple codes. The manuscript is limited to experiences with libNEGF, which does not sufficiently support the generalization without additional benchmarks or surveys.
minor comments (1)
  1. The paper could benefit from a clearer discussion of the scope and limitations of generalizing from a single code base to all Fortran scientific codes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our experience report. We address the major comment below and have revised the manuscript to clarify the scope of our claims.

read point-by-point responses
  1. Referee: [Abstract] The statement that the findings 'provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere' is not backed by comparative data or analysis from multiple codes. The manuscript is limited to experiences with libNEGF, which does not sufficiently support the generalization without additional benchmarks or surveys.

    Authors: We agree that the manuscript presents an experience report on a single codebase (libNEGF) and does not include comparative data from multiple Fortran codes or surveys. The phrasing 'provide data points suggesting' was chosen to frame our findings as one concrete instance rather than a broad statistical claim. To address the concern directly, we will revise the abstract and conclusions to state that our work on libNEGF revealed critical defects that may be representative of issues in other Fortran scientific codes, while explicitly noting the single-code limitation and calling for additional studies to assess prevalence more broadly. revision: yes

Circularity Check

0 steps flagged

No circularity: observational experience report without derivations or self-referential predictions

full rationale

This manuscript is a descriptive experience report on RSE practices applied to a single Fortran code (libNEGF). It documents concrete defects uncovered through CI, testing, and benchmarking, plus performance regressions from system changes. No mathematical derivation chain, fitted parameters, predictions, ansatzes, or uniqueness theorems are present. The generalization to Fortran scientific codes is explicitly framed as 'data points suggesting' prevalence 'as elsewhere' rather than a formal result derived from the inputs by construction. The central narrative remains self-contained against external benchmarks because it reports direct observations without reducing any claim to a self-citation or redefinition of its own measurements.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical experience report on software practices and does not introduce or rely on mathematical axioms, free parameters, or invented entities.

pith-pipeline@v0.9.0 · 5682 in / 1052 out tokens · 36141 ms · 2026-05-21T03:16:25.958465+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

  1. [1]

    Badwaik, M

    doi:10.1080/17445760.2024.2360190 [BBRH26] J. Badwaik, M. Bode, M. Rajski, A. Herten. exaCB: Reproducible Continuous Benchmark Collections at Scale Leveraging an Incremental Approach

  2. [2]

    doi:10.48550/arXiv.2603.22251 [BCC+08] V

    To ap- pear. doi:10.48550/arXiv.2603.22251 [BCC+08] V . R. Basili, J. C. Carver, D. Cruzes, L. M. Hochstein, J. K. Hollingsworth, F. Shull, M. V . Zelkowitz. Understanding the High-Performance-Computing Community: A Software Engineer’s Perspective.IEEE Software25(4):29–36,

  3. [3]

    Br ¨ommel, J

    doi:10.1109/MS.2008.103 [BFS24] D. Br ¨ommel, J. Fritz, R. Speck. Integrated Continuous Benchmarking

  4. [4]

    doi:10.34734/FZJ-2024-01995 [BO11] R. E. Bryant, D. R. O’Hallaron.Computer Systems: A Programmer’s Perspective. Pearson Education, Inc., Boston, MA, USA, 2nd edition,

  5. [5]

    Cardelli

    doi:10.5281/zenodo.7534372 [Car04] L. Cardelli. Type Systems. In Tucker (ed.),Computer Science Handbook. Chap- ter

  6. [6]

    https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a- container/ [Con26] C

    https://www.theregister.com/2011/09/22/cern coverity/ [Con] What is a container? Accessed: 2026-04-13. https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a- container/ [Con26] C. Conrads. Replication Data for: RSE of a Quantum Transport Code and its Effects

  7. [7]

    Technical report, Coverity, Inc.,

    doi:10.26165/JUELICH-DATA/9JHYGV [Cov11] Coverity Scan 2011 Open Source Integrity Report. Technical report, Coverity, Inc.,

  8. [8]

    Dietz, P

    https://web.archive.org/web/20120226115247/https://www.coverity.com/library/ pdf/coverity-scan-2011-open-source-integrity-report.pdf [DLRA15] W. Dietz, P. Li, J. Regehr, V . Adve. Understanding Integer Overflow in C/C++.ACM Trans. Softw. Eng. Methodol.25(1):1–29,

  9. [9]

    doi:10.1145/2743019 [Duv07] P. M. Duvall.Continuous Integration: Improving Software Quality and Reducing Risk. Addison-Wesley Signature Series. Addison-Wesley Professional,

  10. [10]

    doi:10.1109/SC41406.2024.00038 [Hig02] N. J. Higham.Accuracy and Stability of Numerical Algorithms. Society for Indus- trial and Applied Mathematics, 2 edition,

  11. [11]

    doi:10.1137/1.9780898718027 [HP19] J. L. Hennessy, D. A. Patterson.Computer Architecture. Morgan Kaufmann Pub- lishers, 6 edition,

  12. [12]

    Kudrjavets, A

    doi:10.17815/jlsrf-7-183 [KKNR22] G. Kudrjavets, A. Kumar, N. Nagappan, A. Rastogi. The unexplored terrain of com- piler warnings. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice. ICSE-SEIP ’22, pp. 283–284. As- sociation for Computing Machinery, New York, NY , USA,

  13. [13]

    doi:10.1145/3510457.3513057 [Lat11] C. Lattner. What Every C Programmer Should Know About Undefined Behavior #1/3. Online,

  14. [14]

    https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html [Lin] Documentation for /proc/sys/vm/

    Accessed: 2026-04-09. https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html [Lin] Documentation for /proc/sys/vm/. Online. Accessed: 2026-04-13. https://docs.kernel.org/admin-guide/sysctl/vm.html [McC04] S. McConnell.Code Complete: A Practical Handbook of Software Construction. Microsoft Press, Redmond, W A, USA, second edition,

  15. [15]

    https://developers.redhat.com/blog/2018/02/22/container-terminology-practical- introduction [MF21] L

    Accessed: 2026-04-13. https://developers.redhat.com/blog/2018/02/22/container-terminology-practical- introduction [MF21] L. Maggini, R. R. Ferreira. 2D material hybrid heterostructures: achievements and challenges towards high throughput fabrication.Journal of Materials Chemistry C 9:15721–15734,

  16. [16]

    doi:10.1039/D1TC04253J [OR10] W. L. Oberkampf, C. J. Roy.Verification and Validation in Scientific Computing. Cambridge University Press,

  17. [17]

    Pecchia, G

    https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html [PPSC08] A. Pecchia, G. Penazzi, L. Salvucci, A. D. Carlo. Non-equilibrium Green’s functions in density functional tight binding: method and applications.New Journal of Physics 10(6),

  18. [18]

    doi:10.1088/1367-2630/10/6/065022 [Reg10] J. Regehr. A Guide to Undefined Behavior in C and C++, Part

  19. [19]

    https://blog.regehr.org/archives/213 [Str26] A

    Accessed: 2026-04-09. https://blog.regehr.org/archives/213 [Str26] A. Strube. Blablador

  20. [20]

    http://helmholtz-blablador.fz-juelich.de [The24] The Open Group

    Visited: 2026-04-27. http://helmholtz-blablador.fz-juelich.de [The24] The Open Group. The Open Group Base Specifications Issue

  21. [21]

    https://pubs.opengroup.org/onlinepubs/9799919799/ [WZKS13] X. Wang, N. Zeldovich, M. F. Kaashoek, A. Solar-Lezama. Towards optimization- safe systems: analyzing the impact of undefined behavior. In Digney (ed.),SOSP’13: ACM SIGOPS 24th Symposium on Operating Systems Principles. Pp. 260–275. As- sociation for Computing Machinery, New York, NY , USA,

  22. [22]

    doi:10.1145/2517349.2522728