pith. sign in

arxiv: 2503.17192 · v3 · submitted 2025-03-21 · 💻 cs.SE

Employing Continuous Integration inspired workflows for benchmarking of scientific software -- a use case on numerical cut cell quadrature

Pith reviewed 2026-05-22 22:42 UTC · model grok-4.3

classification 💻 cs.SE
keywords continuous integrationbenchmarkingscientific softwarenumerical quadraturecut cell methodsautomationworkflow management
0
0 comments X

The pith

Continuous Integration tools automate benchmark execution and reporting for scientific software even as test designs evolve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Multiple software packages often exist for the same scientific computing task, and choosing among them requires performance benchmarks on suitable test problems. As the set of packages, metrics, and test cases grows or changes during a project, manual benchmark runs become repetitive and error-prone. The paper shows that Continuous Integration tools, already common in software development, can be repurposed to run benchmarks automatically, collect results, and produce reports whenever the setup changes. A reader would care because this lowers the cost of maintaining fair comparisons across alternative numerical methods. The concrete demonstration uses quadrature on cut-cell domains bounded by curves or surfaces in two or three dimensions.

Core claim

Established Continuous Integration tools and practices achieve high automation of benchmark execution and reporting for scientific software. The approach handles the rapid expansion of the parameter space and the common need to add new libraries, adapt metrics, or introduce new benchmark cases without requiring laborious re-evaluation of all prior results by hand.

What carries the argument

Continuous Integration inspired workflows that treat benchmark definitions, executions, and reports as versioned, automatically triggered tasks.

If this is right

  • Adding a new software package requires only the definition of its build and run steps; all prior results are then re-executed and compared automatically.
  • Changes to metrics or addition of new test geometries trigger fresh runs and updated reports without separate manual intervention.
  • Results remain reproducible because every benchmark step is captured in the same version-controlled scripts used by the CI system.
  • The same workflow can compare packages that use fundamentally different discretizations on the same set of implicit or parametric domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern could reduce effort in other fields where competing numerical codes must be re-tested after each code or test-suite update.
  • Public CI configurations could serve as living benchmark suites that external contributors extend through ordinary pull requests.
  • Over time the collected data might reveal systematic performance differences between discretization families that are not visible in single-paper comparisons.

Load-bearing premise

Benchmark designs for scientific software will keep changing after the project starts, forcing repeated full evaluations of all packages and cases.

What would settle it

A side-by-side log showing that the total person-hours spent on manual benchmark maintenance for an evolving set of cut-cell quadrature cases is lower than the hours spent configuring and maintaining the equivalent Continuous Integration workflow.

Figures

Figures reproduced from arXiv: 2503.17192 by Benjamin Marussig, Chen Miao, Florian Kummer, Guilherme H. Teixeira, Irina Shiskina, Josef Kiendl, Michael Loibl, Teoman Toprak.

Figure 1
Figure 1. Figure 1: Illustration of cut elements, which are created by intersecting a – typically, [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Class diagram of the MATLAB control code: The unified driver interface for [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: MATLABs abstract class for the integrator or our use case. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of a successful pipeline in GitLab. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example of a MATLAB test report. Finally, the MATLAB scripts produce benchmarking results, e.g., tables and plots. In the GitLab pipeline, we use the artifacts feature to store these results. For each job, one can define a set of files that are stored as artifacts. Then, two kinds of things can be done: the artifacts can be downloaded from the GitLab web interface, e.g., to be used in a publication. Or,… view at source ↗
Figure 6
Figure 6. Figure 6: First test case. Circular disk with centre point [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: First test case. Convergence study of the relative error of the area under uniform [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Second test case. Shift of circular disk from left to right with 1000 steps. The [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Second test case. Relative error for the problem depicted in Figure 8. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
read the original abstract

In the field of scientific computing, one often finds several alternative software packages (with open or closed source code) for solving a specific problem. These packages sometimes even use alternative methodological approaches, e.g., different numerical discretizations. If one decides to use one of these packages, it is often not clear which one is the best choice. To make an informed decision, it is necessary to measure the performance of the alternative software packages for a suitable set of test problems, i.e. to set up a benchmark. However, setting up benchmarks ad-hoc can become overwhelming as the parameter space expands rapidly. Very often, the design of the benchmark is also not fully set at the start of some project. For instance, adding new libraries, adapting metrics, or introducing new benchmark cases during the project can significantly increase complexity and necessitate laborious re-evaluation of previous results. This paper presents a proven approach that utilizes established Continuous Integration tools and practices to achieve high automation of benchmark execution and reporting. Our use case is the numerical integration (quadrature) on arbitrary domains, which are bounded by implicitly or parametrically defined curves or surfaces in 2D or 3D.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that established Continuous Integration (CI) tools and practices can achieve high automation of benchmark execution and reporting for scientific software, even when benchmark designs evolve during a project (e.g., by adding libraries, adapting metrics, or introducing new cases). The use case is numerical quadrature on arbitrary 2D/3D domains bounded by implicitly or parametrically defined curves or surfaces.

Significance. If the CI-based workflow is shown to deliver the claimed automation, the paper would offer a practical methodological contribution for reproducible and maintainable benchmarking in scientific computing, addressing a common pain point of manual re-evaluation as requirements change.

major comments (1)
  1. [Abstract] Abstract: the central claim that the approach is 'proven' and achieves 'high automation' for the quadrature use case is not supported by any implementation details, quantitative results, or verification of reduced manual effort; this is load-bearing for the methodological contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The single major comment concerns support for claims in the abstract. We address it point-by-point below and propose targeted revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the approach is 'proven' and achieves 'high automation' for the quadrature use case is not supported by any implementation details, quantitative results, or verification of reduced manual effort; this is load-bearing for the methodological contribution.

    Authors: The full manuscript (Sections 2–4) supplies concrete implementation details: we describe the CI pipeline configuration (YAML workflows, job matrices for 2D/3D quadrature cases, integration with cut-cell libraries, and automated result aggregation via scripts that regenerate tables and plots on each commit). The quadrature use case explicitly demonstrates handling of evolving benchmarks—adding a new library, metric, or geometry—without manual re-execution of prior cases, which is shown through before/after workflow diagrams and example repository commits. We acknowledge that the paper does not include quantitative time-and-effort measurements (e.g., person-hours before vs. after CI adoption); such metrics are inherently project-specific and were outside the scope of the methodological contribution. We will revise the abstract to replace 'proven' with 'demonstrated via a detailed use case' and add a short paragraph in the conclusions discussing the qualitative reduction in manual re-evaluation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; methodological workflow description only

full rationale

The paper presents a CI-inspired workflow for automating benchmark execution and reporting in scientific computing, demonstrated on cut-cell quadrature. No derivations, equations, fitted parameters, or mathematical predictions exist in the text. The central claim is a practical methodology whose validity rests on implementation experience rather than any self-referential reduction, self-citation chain, or ansatz. All steps are descriptive and externally verifiable through the described tools and practices, with no load-bearing element that collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a workflow description in software engineering and introduces no mathematical models, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5763 in / 992 out tokens · 42047 ms · 2026-05-22T22:42:33.005590+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

  1. [1]

    M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. W. Boiten, L. B. da Silva San- tos, P. E. Bourne, et al., The fair guiding principles for scientific data management and stewardship, Scientific Data 3 (2016) 160018. doi:10.1038/sdata.2016.18

  2. [2]

    Soares, G

    E. Soares, G. Sizilio, J. Santos, D. A. Da Costa, U. Kulesza, The effects of continuous integration on software development: a system- atic literature review, Empirical Software Engineering 27 (3) (2022) 78. doi:10.1007/s10664-021-10114-1

  3. [3]

    Dubey, K

    A. Dubey, K. Weide, D. Lee, J. Bachan, C. Daley, S. Olofin, N. Taylor, P.M.Rich, L.B.Reid, Ongoingverificationofamultiphysicscommunity code: FLASH,Software: PracticeandExperience45(2)(2015)233–244. doi:10.1002/spe.2220

  4. [4]

    Dubey, Good practices for high-quality scientific computing, Com- puting in Science & Engineering 24 (6) (2022) 72–76.doi:10.1109/ MCSE.2023.3259259

    A. Dubey, Good practices for high-quality scientific computing, Com- puting in Science & Engineering 24 (6) (2022) 72–76.doi:10.1109/ MCSE.2023.3259259

  5. [5]

    D. E. Bernholdt, A. Dubey, P. Grubel, Better scientific software tu- torial at the international conference for high-performance comput- 26 ing, networking, storage, and analysis (sc22) (October 2022). doi: 10.6084/m9.figshare.21384057.v3

  6. [6]

    Marić, D

    T. Marić, D. Gläser, J.-P. Lehr, I. Papagiannidis, B. Lambie, C. Bischof, D. Bothe, A pragmatic workflow for research software engineering in computational science, arXiv:2310.00960 [cs] (October 2023). arXiv: 2310.00960, doi:10.48550/arXiv.2310.00960

  7. [7]

    N. U. Eisty, U. Kanewala, J. C. Carver, Testing research software: an in-depth survey of practices, methods, and tools, Empirical Software Engineering 30 (3) (2025) 81.doi:10.1007/s10664-025-10620-6

  8. [8]

    librosa/librosa: 0.6.3,

    B. Marussig, M. Loibl, T. Toprak, F. Kummer, G. H. Teixeira, B2- m/cutelementintegration: v1.0.3 (April 2025). doi:10.5281/zenodo. 15294335

  9. [9]

    Müller, F

    B. Müller, F. Kummer, M. Oberlack, Highly accurate surface and vol- ume integration on implicit domains by means of moment-fitting, Inter- national Journal for Numerical Methods in Engineering 96 (8) (2013) 512–528.doi:10.1002/nme.4569

  10. [10]

    Zander, T

    N. Zander, T. Bog, M. Elhaddad, R. Espinoza, H. Hu, A. Joly, C. Wu, P. Zerbe, A. Düster, S. Kollmannsberger, J. Parvizian, M. Ruess, D. Schillinger, E. Rank, Fcmlab: A finite cell research toolbox for MATLAB, Advances in Engineering Software 74 (2014) 49–63.doi: 10.1016/j.advengsoft.2014.04.004

  11. [11]

    R. I. Saye, High-order quadrature methods for implicitly defined surfaces and volumes in hyperrectangles, SIAM Journal on Scientific Computing 37 (2) (2015) A993–A1019.doi:10.1137/140966290

  12. [12]

    Kummer, Extended discontinuous Galerkin methods for two-phase flows: the spatial discretization, International Journal for Numerical Methods in Engineering 109 (2) (2017) 259–289

    F. Kummer, Extended discontinuous Galerkin methods for two-phase flows: the spatial discretization, International Journal for Numerical Methods in Engineering 109 (2) (2017) 259–289. doi:10.1002/nme. 5288

  13. [13]

    Gunderman, K

    D. Gunderman, K. Weiss, J. A. Evans, High-accuracy mesh-free quadra- ture for trimmed parametric surfaces and volumes, Computer-Aided De- sign 141 (2021) 103093.doi:10.1016/j.cad.2021.103093. 27

  14. [14]

    Meßmer, T

    M. Meßmer, T. Teschemacher, L. F. Leidinger, R. Wüchner, K.-U. Blet- zinger, Efficient cad-integrated isogeometric analysis of trimmed solids, Computer Methods in Applied Mechanics and Engineering 400 (2022) 115584. doi:10.1016/j.cma.2022.115584

  15. [15]

    R. I. Saye, High-order quadrature on multi-component domains im- plicitly defined by multivariate polynomials, Journal of Computational Physics 448 (2022) 110720.doi:10.1016/j.jcp.2021.110720

  16. [16]

    R. I. Saye, Algoim: Algorithms for implicitly defined geometry, level set methods, and voronoi implicit interface methods,https://github. com/algoim/algoim, accessed: 2025-03-31 (2022)

  17. [17]

    Chair of Fluid Dynamics of Technical University Darmstadt, Bosss - the bounded support spectral solver,https://github.com/FDYdarmstadt/ BoSSS, accessed: 2025-03-31 (2024)

  18. [18]

    Zander, et al., Fcmlab: A finite cell research toolbox for matlab, https://gitlab.lrz.de/cie_sam_public/fcmlab, accessed: 2025-03- 31 (2024)

    N. Zander, et al., Fcmlab: A finite cell research toolbox for matlab, https://gitlab.lrz.de/cie_sam_public/fcmlab, accessed: 2025-03- 31 (2024)

  19. [20]

    Project, Ginkgo: Numerical linear algebra software package,https: //github.com/ginkgo-project/ginkgo, accessed: 2025-03-31 (2024)

    G. Project, Ginkgo: Numerical linear algebra software package,https: //github.com/ginkgo-project/ginkgo, accessed: 2025-03-31 (2024)

  20. [21]

    S. C. Divi, C. V. Verhoosel, F. Auricchio, A. Reali, E. H. Van Brum- melen, Error-estimate-based adaptive integration for immersed isogeo- metric analysis, Computers & Mathematics with Applications 80 (11) (2020) 2481–2516.doi:10.1016/j.camwa.2020.03.026

  21. [22]

    S. C. Divi, P. H. van Zuijlen, T. Hoang, F. de Prenter, F. Auricchio, A. Reali, E. H. van Brummelen, C. V. Verhoosel, Residual-based error estimation and adaptivity for stabilized immersed isogeometric analysis using truncated hierarchical B-splines, Journal of Mechanics 38 (2022) 204–237.doi:10.1093/jom/ufac015. 28

  22. [23]

    Evalf, Nutils: Free and open source python programming library for finite element method computations,https://github.com/evalf/ nutils, accessed: 2025-03-31 (2025)

  23. [24]

    C. Lehrenfeld, High order unfitted finite element methods on level set domains using isoparametric mappings, Computer Methods in Applied Mechanics and Engineering 300 (2016) 716–733.doi:10.1016/j.cma. 2015.12.005

  24. [25]

    A Higher Order Isoparametric Fictitious Domain Method for Level Set Domains

    C. Lehrenfeld, A higher order isoparametric fictitious domain method for level set domains, arXiv:1612.02561 [cs, math] (2017).arXiv:1612. 02561, doi:10.48550/arXiv.1612.02561

  25. [26]

    Lehrenfeld, F

    C. Lehrenfeld, F. Heimann, J. Preuß, H. von Wahl, ngsxfem: An add- on library to the finite element package netgen/ngsolve which enables the use of unfitted finite element technologies,https://github.com/ ngsxfem/ngsxfem, accessed: 2025-03-31 (2021)

  26. [27]

    Gunderman, Quahog: Quadrature for high-order geometries,https: //github.com/davidgunderman/QuaHOG, accessed: 2025-03-31 (2021)

    D. Gunderman, Quahog: Quadrature for high-order geometries,https: //github.com/davidgunderman/QuaHOG, accessed: 2025-03-31 (2021)

  27. [28]

    Teixeira, M

    G. Teixeira, M. Loibl, B. Marussig, Comparison of integration methods for cut elements, in: ECCOMAS 2024, 2024. URL https://www.scipedia.com/public/Teixeira_et_al_2024b 29