RSE of a Quantum Transport Code and its Effects
Pith reviewed 2026-05-21 03:16 UTC · model grok-4.3
The pith
Applying RSE practices to libNEGF exposed defects common to Fortran scientific codes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The systematic use of continuous integration, automated testing, compiler warning correction, and continuous benchmarking on libNEGF over two years revealed critical defects including uninitialized memory reads, out-of-bounds writes, a misunderstood mathematical model in boundary condition handling, and performance regressions from HPC system changes. This provides data points indicating that a class of dangerous defects equivalent to undefined behavior is as prevalent in Fortran scientific codes as in other contexts.
What carries the argument
Continuous integration, automated testing, and continuous benchmarking applied to the libNEGF quantum transport code, which served to uncover hidden defects and performance issues.
If this is right
- Other Fortran scientific codes can be improved by implementing similar continuous testing and benchmarking regimes to catch memory and model errors early.
- Performance stability in scientific simulations can be maintained by monitoring for regressions caused by system configuration changes.
- The practices described can be selectively adopted in both new and existing scientific software projects regardless of the programming language used.
- Addressing misunderstood mathematical models through code review and testing enhances the correctness of simulation results.
Where Pith is reading between the lines
- If such defects are widespread, then many published scientific results from Fortran codes may have subtle platform-dependent inaccuracies that go undetected without these checks.
- Extending the testing approach to codes in other domains like climate modeling or computational biology could test the generalizability of the findings.
- Integrating domain expert review of mathematical models alongside automated tests might prevent implementation errors at the source.
Load-bearing premise
That the defects and issues discovered in libNEGF are representative of those in the wider class of Fortran scientific codes.
What would settle it
A comparable study applying the same RSE practices to several other independent Fortran scientific codes that finds markedly fewer instances of uninitialized memory reads and out-of-bounds writes would challenge the claim of prevalence.
Figures
read the original abstract
This paper presents our research software engineering (RSE) experiences over two years with libNEGF, a quantum transport code. We describe practical approaches to code quality assurance--including continuous integration, automated testing, and compiler warning correction--and performance engineering through continuous benchmarking. Our systematic application of these practices revealed critical defects: uninitialized memory reads, out-of-bounds writes, and notably, a misunderstood mathematical model in our boundary condition handling. We also document how continuous benchmarking exposed performance regressions caused by HPC system configuration changes. Our findings provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere. While libNEGF is implemented in Fortran, most recommendations are applicable to scientific software regardless of implementation language, and they can be implemented selectively or in their entirety for both new and existing projects.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports on two years of research software engineering (RSE) practices applied to libNEGF, a Fortran-based quantum transport code. It details the implementation of continuous integration, automated testing, compiler warning corrections, and continuous benchmarking, which uncovered defects such as uninitialized memory reads, out-of-bounds writes, and a misunderstood boundary condition model, along with performance regressions from HPC system changes. The authors conclude that these findings suggest dangerous defects are as prevalent in Fortran scientific codes as undefined behavior in C/C++.
Significance. The paper contributes practical insights into applying RSE techniques to scientific software, which could aid developers in improving code reliability and performance. The direct experience with defect detection and performance engineering is a strength, but the broader claim about prevalence across Fortran codes requires additional comparative evidence to hold significant weight.
major comments (1)
- [Abstract] The statement that the findings 'provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere' is not backed by comparative data or analysis from multiple codes. The manuscript is limited to experiences with libNEGF, which does not sufficiently support the generalization without additional benchmarks or surveys.
minor comments (1)
- The paper could benefit from a clearer discussion of the scope and limitations of generalizing from a single code base to all Fortran scientific codes.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our experience report. We address the major comment below and have revised the manuscript to clarify the scope of our claims.
read point-by-point responses
-
Referee: [Abstract] The statement that the findings 'provide data points suggesting that a dangerous class of defects--equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran--is as prevalent in Fortran scientific codes as elsewhere' is not backed by comparative data or analysis from multiple codes. The manuscript is limited to experiences with libNEGF, which does not sufficiently support the generalization without additional benchmarks or surveys.
Authors: We agree that the manuscript presents an experience report on a single codebase (libNEGF) and does not include comparative data from multiple Fortran codes or surveys. The phrasing 'provide data points suggesting' was chosen to frame our findings as one concrete instance rather than a broad statistical claim. To address the concern directly, we will revise the abstract and conclusions to state that our work on libNEGF revealed critical defects that may be representative of issues in other Fortran scientific codes, while explicitly noting the single-code limitation and calling for additional studies to assess prevalence more broadly. revision: yes
Circularity Check
No circularity: observational experience report without derivations or self-referential predictions
full rationale
This manuscript is a descriptive experience report on RSE practices applied to a single Fortran code (libNEGF). It documents concrete defects uncovered through CI, testing, and benchmarking, plus performance regressions from system changes. No mathematical derivation chain, fitted parameters, predictions, ansatzes, or uniqueness theorems are present. The generalization to Fortran scientific codes is explicitly framed as 'data points suggesting' prevalence 'as elsewhere' rather than a formal result derived from the inputs by construction. The central narrative remains self-contained against external benchmarks because it reports direct observations without reducing any claim to a self-citation or redefinition of its own measurements.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our findings provide data points suggesting that a dangerous class of defects—equivalent to undefined behavior in C/C++ and processor-dependent behavior in Fortran—is as prevalent in Fortran scientific codes as elsewhere.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
doi:10.1080/17445760.2024.2360190 [BBRH26] J. Badwaik, M. Bode, M. Rajski, A. Herten. exaCB: Reproducible Continuous Benchmark Collections at Scale Leveraging an Incremental Approach
-
[2]
doi:10.48550/arXiv.2603.22251 [BCC+08] V
To ap- pear. doi:10.48550/arXiv.2603.22251 [BCC+08] V . R. Basili, J. C. Carver, D. Cruzes, L. M. Hochstein, J. K. Hollingsworth, F. Shull, M. V . Zelkowitz. Understanding the High-Performance-Computing Community: A Software Engineer’s Perspective.IEEE Software25(4):29–36,
-
[3]
doi:10.1109/MS.2008.103 [BFS24] D. Br ¨ommel, J. Fritz, R. Speck. Integrated Continuous Benchmarking
-
[4]
doi:10.34734/FZJ-2024-01995 [BO11] R. E. Bryant, D. R. O’Hallaron.Computer Systems: A Programmer’s Perspective. Pearson Education, Inc., Boston, MA, USA, 2nd edition,
-
[5]
doi:10.5281/zenodo.7534372 [Car04] L. Cardelli. Type Systems. In Tucker (ed.),Computer Science Handbook. Chap- ter
-
[6]
https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a- container/ [Con26] C
https://www.theregister.com/2011/09/22/cern coverity/ [Con] What is a container? Accessed: 2026-04-13. https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a- container/ [Con26] C. Conrads. Replication Data for: RSE of a Quantum Transport Code and its Effects
work page 2011
-
[7]
Technical report, Coverity, Inc.,
doi:10.26165/JUELICH-DATA/9JHYGV [Cov11] Coverity Scan 2011 Open Source Integrity Report. Technical report, Coverity, Inc.,
- [8]
-
[9]
doi:10.1145/2743019 [Duv07] P. M. Duvall.Continuous Integration: Improving Software Quality and Reducing Risk. Addison-Wesley Signature Series. Addison-Wesley Professional,
-
[10]
doi:10.1109/SC41406.2024.00038 [Hig02] N. J. Higham.Accuracy and Stability of Numerical Algorithms. Society for Indus- trial and Applied Mathematics, 2 edition,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41406.2024.00038 2024
-
[11]
doi:10.1137/1.9780898718027 [HP19] J. L. Hennessy, D. A. Patterson.Computer Architecture. Morgan Kaufmann Pub- lishers, 6 edition,
-
[12]
doi:10.17815/jlsrf-7-183 [KKNR22] G. Kudrjavets, A. Kumar, N. Nagappan, A. Rastogi. The unexplored terrain of com- piler warnings. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice. ICSE-SEIP ’22, pp. 283–284. As- sociation for Computing Machinery, New York, NY , USA,
-
[13]
doi:10.1145/3510457.3513057 [Lat11] C. Lattner. What Every C Programmer Should Know About Undefined Behavior #1/3. Online,
-
[14]
Accessed: 2026-04-09. https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html [Lin] Documentation for /proc/sys/vm/. Online. Accessed: 2026-04-13. https://docs.kernel.org/admin-guide/sysctl/vm.html [McC04] S. McConnell.Code Complete: A Practical Handbook of Software Construction. Microsoft Press, Redmond, W A, USA, second edition,
work page 2026
-
[15]
https://developers.redhat.com/blog/2018/02/22/container-terminology-practical- introduction [MF21] L
Accessed: 2026-04-13. https://developers.redhat.com/blog/2018/02/22/container-terminology-practical- introduction [MF21] L. Maggini, R. R. Ferreira. 2D material hybrid heterostructures: achievements and challenges towards high throughput fabrication.Journal of Materials Chemistry C 9:15721–15734,
work page 2026
-
[16]
doi:10.1039/D1TC04253J [OR10] W. L. Oberkampf, C. J. Roy.Verification and Validation in Scientific Computing. Cambridge University Press,
-
[17]
https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html [PPSC08] A. Pecchia, G. Penazzi, L. Salvucci, A. D. Carlo. Non-equilibrium Green’s functions in density functional tight binding: method and applications.New Journal of Physics 10(6),
work page 2008
-
[18]
doi:10.1088/1367-2630/10/6/065022 [Reg10] J. Regehr. A Guide to Undefined Behavior in C and C++, Part
-
[19]
https://blog.regehr.org/archives/213 [Str26] A
Accessed: 2026-04-09. https://blog.regehr.org/archives/213 [Str26] A. Strube. Blablador
work page 2026
-
[20]
http://helmholtz-blablador.fz-juelich.de [The24] The Open Group
Visited: 2026-04-27. http://helmholtz-blablador.fz-juelich.de [The24] The Open Group. The Open Group Base Specifications Issue
work page 2026
-
[21]
https://pubs.opengroup.org/onlinepubs/9799919799/ [WZKS13] X. Wang, N. Zeldovich, M. F. Kaashoek, A. Solar-Lezama. Towards optimization- safe systems: analyzing the impact of undefined behavior. In Digney (ed.),SOSP’13: ACM SIGOPS 24th Symposium on Operating Systems Principles. Pp. 260–275. As- sociation for Computing Machinery, New York, NY , USA,
-
[22]
doi:10.1145/2517349.2522728
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.