Algorithms and data structures for matrix-free finite element operators with MPI-parallel sparse multi-vectors

Denis Davydov; Martin Kronbichler

arxiv: 1907.01005 · v1 · pith:C7ONL5XAnew · submitted 2019-07-01 · 💻 cs.MS · cs.DS· cs.NA· math.NA· physics.comp-ph

Algorithms and data structures for matrix-free finite element operators with MPI-parallel sparse multi-vectors

Denis Davydov , Martin Kronbichler This is my paper

Pith reviewed 2026-05-25 11:30 UTC · model grok-4.3

classification 💻 cs.MS cs.DScs.NAmath.NAphysics.comp-ph

keywords matrix-free methodsfinite element operatorssparse multi-vectorsMPI parallelHamiltonian evaluationquantum mechanicsdeal.II

0 comments

The pith

An a-priori support choice for sparse multi-vectors enables matrix-free finite element operators to reach 157 GFlop/s on Intel Cascade Lake with MPI scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents algorithms and data structures for sparse parallel vectors inside a finite element library. These allow matrix-free evaluation of operators through cell-wise quadrature together with sparse matrix-multivector products, inner products onto a sparse subspace, and post-multiplication by a square matrix. The approach exploits local support of the vectors to keep communication and fill-in low in an MPI setting. Performance measurements show 157 GFlop/s on Cascade Lake together with strong and weak scaling on benchmark problems that use quadratic and quartic elements.

Core claim

By fixing an a-priori support for each vector in a sparse multi-vector, the authors construct data structures that support three matrix-free operations: sparse matrix-multivector multiplication, projection of an operator onto the sparse subspace, and post-multiplication of the sparse multi-vector by a square matrix. All three operations are realized via cell-wise quadrature of the Hamiltonian without assembling global matrices. On Intel Cascade Lake this implementation delivers 157 GFlop/s; strong and weak scaling results are shown for typical benchmark problems discretized with quadratic and quartic finite-element bases.

What carries the argument

Matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature, driven by an a-priori chosen support for each vector inside an MPI-parallel sparse multi-vector.

If this is right

Matrix-free SpMM, inner products, and post-multiplication become practical without assembling global sparse matrices.
Linear-scaling solution methods for localized problems become feasible inside existing finite-element libraries.
Node-level performance is limited by the roofline of the target CPU rather than by matrix assembly overhead.
The same support-based data structures apply to both quadratic and quartic bases on the reported benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same support-selection idea could be tested on other matrix-free operators that currently rely on dense vectors.
Adaptive rather than fixed support might further reduce communication if the localization pattern changes during a simulation.
The approach may transfer to other high-order discretization libraries that already use cell-wise quadrature.

Load-bearing premise

An a-priori chosen support for each vector remains small enough to keep communication volume and fill-in low during the three matrix-free operations in the MPI setting.

What would settle it

A scaling run on a larger number of MPI ranks that shows communication time or extra fill-in growing faster than the arithmetic work would falsify the efficiency claim.

Figures

Figures reproduced from arXiv: 1907.01005 by Denis Davydov, Martin Kronbichler.

**Figure 1.** Figure 1: FE mesh with quadratic Lagrange basis partitioned into 2 processes. Locally owned DoFs are depicted with crosses whereas [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Arithmetic complexity of matrix-free and matrix-based CSR approaches for a single matrix-vector multiplication with the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Three support domains for sparse FE vectors distributed over two MPI processes. Non-zero locally owned DoFs are depicted [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Renumbering of DoFs using Hilbert space filling curves via Algorithm [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: BCSR sparsity for the six sparse vectors with support depicted in Figure [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: The cut of the mesh for M = 320 (carbon nanotube with 160 atoms) with quadratic FE basis. Color indicates MPI partitioning into ten processes. As a benchmark problem, we look at carbon nanotubes of dierent lengths. Note that within this contribution we do not solve the DFT but only benchmark and study the required operations for the OM approach. Carbon nanotubes are chosen as a benchmark since these syste… view at source ↗

**Figure 7.** Figure 7: Roofline performance model. The dashed roof line ”w/o r/w” represents the matrix-free Algorithm [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Breakdown of computing times of various steps of Algorithm [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Roofline performance model on Intel Xeon Cascade Lake using the GCC compiler. The dashed roof line ”w/o r/w” represents [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Scaling results for carbon nanotubes of dierent lengths for a fixed number of processors. Timings are the average of each [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Strong scaling results for carbon nanotube with [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Weak scaling results for carbon nanotubes of dierent length. Timings are the average of each operation called thirty times. [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

read the original abstract

Traditional solution approaches for problems in quantum mechanics scale as $\mathcal O(M^3)$, where $M$ is the number of electrons. Various methods have been proposed to address this issue and obtain linear scaling $\mathcal O(M)$. One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, namely that the solution can be sought in terms of non-orthogonal orbitals with local support. In this work a numerically efficient implementation of sparse parallel vectors within the open-source finite element library deal.II is proposed. The main algorithmic ingredient is the matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature. Based on an a-priori chosen support for each vector we develop algorithms and data structures to perform (i) matrix-free sparse matrix multivector products (SpMM), (ii) the projection of an operator onto a sparse sub-space (inner products), and (iii) post-multiplication of a sparse multivector with a square matrix. The node-level performance is analyzed using a roofline model. Our matrix-free implementation of finite element operators with sparse multivectors achieves the performance of 157 GFlop/s on Intel Cascade Lake architecture. Strong and weak scaling results are reported for a typical benchmark problem using quadratic and quartic finite element bases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Davydov and Kronbichler extend deal.II matrix-free operators to sparse multi-vectors with reported 157 GFlop/s performance and scaling, though communication efficiency depends on support choice.

read the letter

The one or two things to know are that Davydov and Kronbichler have implemented algorithms and data structures for matrix-free operations on sparse multi-vectors in deal.II, targeting linear-scaling quantum mechanics, and they report node performance of 157 GFlop/s along with strong and weak scaling on quadratic and quartic finite elements. This builds on the library's existing matrix-free quadrature approach by adding support for a-priori chosen local supports per vector. The three key operations covered are the sparse matrix multi-vector product, the projection onto the sparse subspace through inner products, and the post-multiplication with a square matrix, all done in an MPI-parallel way. The paper does well in laying out the implementation choices for these operations and then analyzing performance with a roofline model on Intel Cascade Lake hardware. The scaling results for a benchmark problem provide concrete evidence that the approach works at least for the tested cases with quadratic and quartic bases. On the soft spots, the central assumption is that the a-priori support selection avoids prohibitive communication overhead in the distributed setting. The stress-test concern is valid in principle: if the support leads to non-local overlaps that increase with more processes or higher degrees, the efficiency claims could be limited. However, since the paper includes scaling plots, those results can be used to assess whether the assumption held in practice for their setup. No mention of error bars or independent verification of the GFlop/s figure appears in the abstract, which is a minor point for an implementation paper. This paper is for researchers working on finite element methods for quantum chemistry or physics who need efficient handling of localized, sparse representations. It would be of interest to the deal.II community and others developing matrix-free codes for large-scale simulations. The work shows clear thinking on adapting known techniques to the sparse multi-vector case, so it qualifies as serious engagement with the problem. I recommend sending it for peer review, as the performance data and implementation in an open library make it worth referee scrutiny even if revisions are needed on the communication analysis.

Referee Report

1 major / 0 minor

Summary. The paper presents algorithms and data structures in the deal.II library for MPI-parallel sparse multi-vectors to enable matrix-free finite element operators. The core approach uses a-priori chosen per-vector support to implement (i) sparse matrix multivector products (SpMM), (ii) operator projection via inner products, and (iii) post-multiplication with a square matrix. Node-level performance is analyzed via roofline model, with a reported peak of 157 GFlop/s on Intel Cascade Lake; strong and weak scaling results are shown for quadratic and quartic bases on a benchmark problem.

Significance. If the performance and scaling claims hold under the stated assumptions, the work provides a practical foundation for linear-scaling O(M) methods in quantum mechanics within distributed finite-element frameworks by exploiting orbital localization. The matrix-free quadrature-based operator evaluation and explicit handling of sparse multivectors are concrete contributions that could be adopted in other libraries.

major comments (1)

[Abstract] Abstract: The central performance claim of 157 GFlop/s and the reported strong/weak scaling rest on the matrix-free SpMM, inner-product, and post-multiplication kernels remaining communication-efficient under an a-priori chosen support. No communication-volume bound, halo-exchange analysis, or fill-in estimate is supplied as a function of process count or polynomial degree; without this, the roofline numbers and scaling curves cannot be verified to demonstrate the claimed efficiency rather than being limited by MPI overhead.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive assessment of its significance for linear-scaling methods in quantum mechanics. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim of 157 GFlop/s and the reported strong/weak scaling rest on the matrix-free SpMM, inner-product, and post-multiplication kernels remaining communication-efficient under an a-priori chosen support. No communication-volume bound, halo-exchange analysis, or fill-in estimate is supplied as a function of process count or polynomial degree; without this, the roofline numbers and scaling curves cannot be verified to demonstrate the claimed efficiency rather than being limited by MPI overhead.

Authors: We agree that the manuscript would benefit from an explicit communication analysis. The presented scaling results are empirical and rely on the a-priori localized support to ensure that halo exchanges remain limited to neighboring processes. In the revised manuscript we will add a dedicated subsection that derives communication-volume bounds and halo sizes as functions of process count and polynomial degree, using the chosen support pattern. These estimates will be compared against the observed strong- and weak-scaling curves to confirm that MPI overhead does not dominate the reported performance. revision: yes

Circularity Check

0 steps flagged

No circularity: implementation and benchmarking study

full rationale

The paper describes algorithms, data structures, and MPI-parallel implementations for sparse multi-vectors and matrix-free operators in deal.II. Performance numbers (157 GFlop/s) and scaling results are empirical measurements on Cascade Lake hardware for quadratic/quartic bases. No equations, predictions, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The a-priori support choice is an explicit algorithmic assumption whose validity is tested via reported benchmarks rather than derived from itself. This is a standard self-contained engineering paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an algorithmic implementation study and introduces no new physical constants, fitted parameters, or postulated entities. It relies on standard assumptions of finite-element methods and parallel computing.

axioms (1)

domain assumption Finite-element basis functions have local support that can be exploited by an a-priori chosen vector support pattern
Invoked to justify the sparse multi-vector representation and matrix-free quadrature approach.

pith-pipeline@v0.9.0 · 5774 in / 1356 out tokens · 54590 ms · 2026-05-25T11:30:04.548401+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

Kadir Akbudak, Oguz Selvitopi, and Cevdet Aykanat. 2018. Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC) 4, 3 (2018), 13

work page 2018
[2]

Giovanni Alze/t_ta, Daniel Arndt, Wolfgang Bangerth, Vishal Boddu, Benjamin Brands, Denis Davydov, Rene Gassm¨oller, Timo Heister, Luca Heltai, Katharina Kormann, Martin Kronbichler, Ma/t_thias Maier, Jean-Paul Pelteret, Bruno Turcksin, and David Wells. 2018. /T_hedeal.II Library, Version 9.0. Journal of Numerical Mathematics 26, 4 (2018), 173–183. h/t_tp...

work page doi:10.1515/jnma-2018-0054 2018
[3]

Hartwig Anzt, Stanimire Tomov, and Jack Dongarra. 2015. Accelerating the LOBPCG Method on GPUs Using a Blocked Sparse Matrix Vector Product. In Proceedings of the Symposium on High Performance Computing (HPC ’15) . Society for Computer Simulation International, 75–82. h/t_tp://dl.acm.org/citation.cfm?id=2872599.2872609

work page arXiv 2015
[4]

Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D

Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Ma/t_thew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, and Hong Zhang. 2015.PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.6. Argonne National Laboratory

work page 2015
[5]

Gang Bao, Guanghui Hu, and Di Liu. 2012. Numerical Solution of the Kohn-Sham Equation by Finite Element Methods with an Adaptive Mesh Redistribution Technique. Journal of Scienti/f_ic Computing55, 2 (#sep# 2012), 372–391. h/t_tps://doi.org/10.1007/s10915-012-9636-1

work page doi:10.1007/s10915-012-9636-1 2012
[6]

/T_homas L. Beck. 2009.Real-Space and Multigrid Methods in Computational Chemistry . John Wiley & Sons, Inc., Hoboken, New Jersey, Chapter 5, 223–285. h/t_tps://doi.org/10.1002/9780470399545.ch5

work page doi:10.1002/9780470399545.ch5 2009
[7]

Nicolas Bock and Ma/t_t Challacombe. 2013. An optimized sparse approximate matrix multiply for matrices with decay.SIAM Journal on Scienti/f_ic Computing 35, 1 (2013), C72–C98

work page 2013
[8]

Nicolas Bock, Ma/t_t Challacombe, and Laxmikant V Kal´e. 2016. Solvers for O(N) Electronic Structure in the Strong Scaling Limit. SIAM Journal on Scienti/f_ic Computing38, 1 (2016), C1–C21

work page 2016
[9]

Urban Borˇstnik, Joost VandeVondele, Val´ery Weber, and J¨urg Hu/t_ter. 2014. Sparse matrix multiplication: /T_he distributed block-compressed sparse row library. Parallel Comput. 40, 5-6 (2014), 47–58

work page 2014
[10]

Bowler, Ian J

David R. Bowler, Ian J. Bush, and Michael J. Gillan. 2000. Practical methods for ab initio calculations on thousands of atoms. International Journal of /Q_uantum Chemistry77, 5 (2000), 831–842. Manuscript submi/t_ted to ACM 28 Davydov, D. and Kronbichler, M

work page 2000
[11]

Bowler, Tsuyoshi Miyazaki, and Michael J

David R. Bowler, Tsuyoshi Miyazaki, and Michael J. Gillan. 2001. Parallel sparse matrix multiplication for linear scaling electronic structure calculations. Computer physics communications 137, 2 (2001), 255–273

work page 2001
[12]

Aydin Buluc and John R Gilbert. 2008. Challenges and advances in parallel sparse matrix-matrix multiplication. In 37th International Conference on Parallel Processing. IEEE, 503–510

work page 2008
[13]

Steven K Burger and Weitao Yang. 2008. Linear-scaling quantum calculations using non-orthogonal localized molecular orbitals. Journal of Physics: Condensed Ma/t_ter20, 29 (2008), 294209

work page 2008
[14]

Eric J Bylaska, Michael Holst, and John H Weare. 2009. Adaptive Finite Element Method for Solving the Exact Kohn-Sham Equation of Density Functional /T_heory.Journal of Chemical /T_heory and Computation5, 4 (#apr# 2009), 937–948. h/t_tps://doi.org/10.1021/ct800350j

work page doi:10.1021/ct800350j 2009
[15]

Ma/t_t Challacombe. 2000. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory.Computer physics communications 128, 1-2 (2000), 93–107

work page 2000
[16]

Denis Davydov, Tymo/f_iy Gerasimov, Jean-Paul Pelteret, and Paul Steinmann. 2017. Convergence study of the h-adaptive PUM and the hp-adaptive FEM applied to eigenvalue problems in quantum mechanics. Advanced Modeling and Simulation in Engineering Sciences 4, 1 (12 Dec 2017), 7. h/t_tps://doi.org/10.1186/s40323-017-0093-0

work page doi:10.1186/s40323-017-0093-0 2017
[17]

Denis Davydov, Timo Heister, Martin Kronbichler, and Paul Steinmann. 2018. Matrix-free locally adaptive /f_inite element solution of density- functional theory with nonorthogonal orbitals and multigrid preconditioning. Physica Status Solidi B: Basic Solid State Physics 255, 9 (2018). h/t_tps://doi.org/10.1002/pssb.201800069

work page doi:10.1002/pssb.201800069 2018
[18]

Jun Fang, Xingyu Gao, and Aihui Zhou. 2012. A Kohn-Sham equation solver based on hexahedral /f_inite elements.J. Comput. Phys. 231, 8 (2012), 3166–3180

work page 2012
[19]

Jean-Luc Fa/t_tebert and Jerry Bernholc. 2000. Towards grid-basedO(N) density-functional theory methods: Optimized nonorthogonal orbitals and multigrid acceleration. Phys. Rev. B 62 (Jul 2000), 1713–1722. Issue 3. h/t_tps://doi.org/10.1103/PhysRevB.62.1713

work page doi:10.1103/physrevb.62.1713 2000
[20]

Jean-Luc Fa/t_tebert and Francois Gygi. 2004. Linear scaling /f_irst-principles molecular dynamics with controlled accuracy. Computer Physics Communications 162, 1 (2004), 24–36

work page 2004
[21]

Jean-Luc Fa/t_tebert and Francois Gygi. 2006. Linear-scaling /f_irst-principles molecular dynamics with plane-waves accuracy.Phys. Rev. B 73 (Mar 2006), 115124. Issue 11. h/t_tps://doi.org/10.1103/PhysRevB.73.115124

work page doi:10.1103/physrevb.73.115124 2006
[22]

Hornung, and Andrew M

Jean-Luc Fa/t_tebert, Richard D. Hornung, and Andrew M. Wissink. 2007. Finite element approach for density functional theory calculations on locally-re/f_ined meshes.J. Comput. Phys. 223, 2 (#may# 2007), 759–773. h/t_tps://doi.org/10.1016/j.jcp.2006.10.013

work page doi:10.1016/j.jcp.2006.10.013 2007
[23]

Frey and D.J

J.T. Frey and D.J. Doren. 2011. TubeGen 3.4 (web-interface, h/t_tp://turin.nss.udel.edu/research/tubegenonline.html), University of Delaware, Newark, DE. (2011)

work page 2011
[24]

Giulia Galli and Michele Parrinello. 1992. Large scale electronic structure calculations. Physical review le/t_ters69, 24 (1992), 3547

work page 1992
[25]

Goringe, Eduardo Hern ´andez, Michael J

Chris M. Goringe, Eduardo Hern ´andez, Michael J. Gillan, and Ian J. Bush. 1997. Linear-scaling DFT-pseudopotential calculations on parallel computers. Computer physics communications 102, 1-3 (1997), 1–16

work page 1997
[26]

Fred G Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical So/f_tware (TOMS)4, 3 (1978), 250–269

work page 1978
[27]

Heroux, Roscoe A

Michael A. Heroux, Roscoe A. Bartle/t_t, Vicki E. Howle, Robert J. Hoekstra, Jonathan J. Hu, Tamara G. Kolda, Richard B. Lehoucq, Kevin R. Long, Roger P. Pawlowski, Eric T. Phipps, Andrew G. Salinger, Heidi K. /T_hornquist, Ray S. Tuminaro, James M. Willenbring, Alan Williams, and Kendall S. Stanley. 2005. An overview of the Trilinos project. ACM Trans. M...

work page doi:10.1145/1089014.1089021 2005
[28]

Kikuji Hirose and Tomoya Ono. 2001. Direct minimization to generate electronic states with proper occupation numbers. Physical Review B 64, 8 (2001), 085105

work page 2001
[29]

Kikuji Hirose, Tomoya Ono, Yoshitaka Fujimoto, and Shigeru Tsukamoto. 2005. First-Principles Claculations in Real-Space Formalism . Imperial College Press, 57 Shelton Street, Covent Garden, London WC2H 9HE

work page 2005
[30]

Hohenberg and Walter Kohn

Pierre C. Hohenberg and Walter Kohn. 1964. Inhomogeneous electron gas. Physical Review 136, 3B (1964), B864–B871

work page 1964
[31]

C ¸ataly¨urek, Srinivasan Parthasarathy, and P

Changwan Hong, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Jinsung Kim, S ¨ureyya Emre Kurt, Israt Nisa, Shivani Sabhlok, ¨Umit V. C ¸ataly¨urek, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Eﬃcient Sparse-matrix Multi-vector Product on GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing...

work page arXiv 2018
[32]

Sohrab Ismail-Beigi and Tomas A. Arias. 1999. Locality of the Density Matrix in Metals, Semiconductors, and Insulators. Physical Review Le/t_ters82 (Mar 1999), 2127–2130. Issue 10. h/t_tps://doi.org/10.1103/PhysRevLe/t_t.82.2127

work page doi:10.1103/physrevle/t_t.82.2127 1999
[33]

Jeremy Kepner and John Gilbert. 2011. Graph algorithms in the language of linear algebra . SIAM

work page 2011
[34]

King-Smith and David Vanderbilt

R.D. King-Smith and David Vanderbilt. 1994. First-principles investigation of ferroelectricity in perovskite compounds. Physical Review B 49, 9 (1994), 5828

work page 1994
[35]

Walter Kohn. 1959. Analytic Properties of Bloch Waves and Wannier Functions. Physical Review 115 (Aug 1959), 809–821. Issue 4. h/t_tps: //doi.org/10.1103/PhysRev.115.809

work page doi:10.1103/physrev.115.809 1959
[36]

Walter Kohn and Lu Jeu Sham. 1965. Self-consistent equations including exchange and correlation eﬀects. Physical Review 140, 4A (1965), A1133–A1138. Manuscript submi/t_ted to ACM Algorithms and data structures for matrix-free /f_inite element operators with MPI-parallel sparse multi-vectors 29

work page 1965
[37]

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R Bishop. 2014. A uni/f_ied sparse matrix data format for eﬃcient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scienti/f_ic Computing36, 5 (2014), C401–C423

work page 2014
[38]

Moritz Kreutzer, Jonas /T_hies, Melven R¨ohrig-Z¨ollner, Andreas Pieper, Faisal Shahzad, Martin Galgon, Achim Basermann, Holger Fehske, Georg Hager, and Gerhard Wellein. 2017. GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems. International Journal of Parallel Programming 45, 5 (2017), 1046–1072

work page 2017
[39]

Martin Kronbichler and Katharina Kormann. 2012. A generic interface for parallel cell-based /f_inite element operator application.Computers & Fluids 63 (jun 2012), 135–147. h/t_tps://doi.org/10.1016/j.comp/f_luid.2012.04.012

work page doi:10.1016/j.comp/f_luid.2012.04.012 2012
[40]

Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin /f_inite element operators.ACM Trans. Math. So/f_tware(2019). In press

work page 2019
[41]

Jianfeng Lu and Kyle /T_hicke. 2017. Orbital minimization method withl 1 regularization. J. Comput. Phys. 336 (2017), 87–103

work page 2017
[42]

Francesco Mauri and Giulia Galli. 1994. Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling. Physical Review B 50, 7 (1994), 4316

work page 1994
[43]

Michael McCourt, Barry Smith, and Hong Zhang. 2013. Eﬃcient sparse Matrix-Matrix products using colorings. SIAM J. Matrix Anal. Appl. (2013)

work page 2013
[44]

Phani Motamarri, Sambit Das, Shiva Rudraraju, Krishnendu Ghosh, Denis Davydov, and Vikram Gavini. 2019. DFT-FE - A massively parallel adaptive /f_inite-element code for large-scale density functional theory calculations.Computer Physics Communications (2019)

work page 2019
[45]

Phani Motamarri, Michael R Nowak, Kenneth Leiter, Jaroslaw Knap, and Vikram Gavini. 2012. Higher-order adaptive /f_inite-element methods for Kohn-Sham density functional theory. J. Comput. Phys. 253, 15 (#jun# 2012), 308–343

work page 2012
[46]

Kurtis L. Nusbaum. 2011. Optimizing Tpetra’s sparse matrix-matrix multiplication routine . Technical Report. Technical report, SAND2011–6036, Sandia National Laboratories, Albuquerque, NM

work page 2011
[47]

Pablo Ordej´on, David A Drabold, Richard M Martin, and Ma/t_thew P Grumbach. 1995. Linear system-size scaling methods for electronic-structure calculations. Physical Review B 51, 3 (1995), 1456

work page 1995
[48]

Pask, Barry M

John E. Pask, Barry M. Klein, Philip A. Sterne, and Chingyao Y. Fong. 2001. Finite-element methods in electronic-structure theory. Computer Physics Communications 135, 1 (2001), 1–34

work page 2001
[49]

Pask and Natarajan Sukumar

John E. Pask and Natarajan Sukumar. 2017. Partition of unity /f_inite element method for quantum mechanical materials calculations. Extreme Mechanics Le/t_ters11 (2017), 8–17. h/t_tps://doi.org/10.1016/j.eml.2016.11.003

work page doi:10.1016/j.eml.2016.11.003 2017
[50]

Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G Pudov, Vadim O Pirogov, and Pradeep Dubey. 2015. Parallel eﬃcient sparse matrix-matrix multiplication on multicore platforms. In International Conference on High Performance Computing . Springer, 48–57

work page 2015
[51]

Liang Peng, Feng Long Gu, and Weitao Yang. 2013. Eﬀective preconditioning for ab initio ground state energy minimization with non-orthogonal localized molecular orbitals. Physical Chemistry Chemical Physics 15, 37 (2013), 15518–15527

work page 2013
[52]

L Ramdas Ram-Mohan. 2002. Finite Element and Boundary Element Applications in /Q_uantum Mechanics. Oxford University Press

work page 2002
[53]

Emanuel H Rubensson, Elias Rudberg, and Pawe l Sa lek. 2006. Sparse matrix algebra for quantum modeling of large systems. In International Workshop on Applied Parallel Computing. Springer, 90–99

work page 2006
[54]

Emanuel H Rubensson, Elias Rudberg, and Pawe l Sa lek. 2007. A hierarchic sparse matrix data structure for large-scale Hartree-Fock/Kohn-Sham calculations. Journal of computational chemistry 28, 16 (2007), 2531–2537

work page 2007
[55]

Elias Rudberg, Emanuel H Rubensson, Pawe l Sa lek, and Anastasia Kruchinina. 2018. Ergo: An open-source program for linear-scaling electronic structure calculations. So/f_twareX7 (2018), 107–111

work page 2018
[56]

Chandra Saravanan, Yihan Shao, Roi Baer, Philip N Ross, and Martin Head-Gordon. 2003. Sparse matrix multiplications for linear scaling electronic structure calculations in an atom-centered basis set using multiatom blocks. Journal of computational chemistry 24, 5 (2003), 618–622

work page 2003
[57]

Volker Schauer and Christian Linder. 2015. /T_he reduced basis method in all-electron calculations with /f_inite elements.Advances in Computational Mathematics 41, 5 (2015), 1035–1047. h/t_tps://doi.org/10.1007/s10444-014-9374-z

work page doi:10.1007/s10444-014-9374-z 2015
[58]

Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel So/f_tware Tools and Tool Infrastructures. San Diego CA

work page 2010
[59]

Eiji Tsuchida and Masaru Tsukada. 1996. Adaptive /f_inite-element method for electronic-structure calculations. Phys. Rev. B 54, 11 (sep 1996), 7602–7605. h/t_tps://doi.org/10.1103/physrevb.54.7602

work page doi:10.1103/physrevb.54.7602 1996
[60]

Eiji Tsuchida and Masaru Tsukada. 1998. Large-Scale Electronic-Structure Calculations Based on the Adaptive Finite-Element Method. Journal of the Physical Society of Japan 67, 11 (1998), 3844–3858. h/t_tps://doi.org/10.1143/JPSJ.67.3844

work page doi:10.1143/jpsj.67.3844 1998
[61]

Young, Eloy Romero, and Jose E

Toby D. Young, Eloy Romero, and Jose E. Roman. 2013. Parallel /f_inite element density functional computations exploiting grid re/f_inement and subspace recycling. In Computer Physics Communications. IEEE, 66–72. h/t_tps://doi.org/10.1109/DFT.2011.68

work page doi:10.1109/dft.2011.68 2013
[62]

Dier Zhang, Lihua Shen, Aihui Zhou, and Xin-Gao Gong. 2008. Finite element method for solving Kohn–Sham equations based on self-adaptive tetrahedral mesh. Physics Le/t_ters A372, 30 (#jul# 2008), 5071–5076. h/t_tps://doi.org/10.1016/j.physleta.2008.05.075 Manuscript submi/t_ted to ACM

work page doi:10.1016/j.physleta.2008.05.075 2008

[1] [1]

Kadir Akbudak, Oguz Selvitopi, and Cevdet Aykanat. 2018. Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing (TOPC) 4, 3 (2018), 13

work page 2018

[2] [2]

Giovanni Alze/t_ta, Daniel Arndt, Wolfgang Bangerth, Vishal Boddu, Benjamin Brands, Denis Davydov, Rene Gassm¨oller, Timo Heister, Luca Heltai, Katharina Kormann, Martin Kronbichler, Ma/t_thias Maier, Jean-Paul Pelteret, Bruno Turcksin, and David Wells. 2018. /T_hedeal.II Library, Version 9.0. Journal of Numerical Mathematics 26, 4 (2018), 173–183. h/t_tp...

work page doi:10.1515/jnma-2018-0054 2018

[3] [3]

Hartwig Anzt, Stanimire Tomov, and Jack Dongarra. 2015. Accelerating the LOBPCG Method on GPUs Using a Blocked Sparse Matrix Vector Product. In Proceedings of the Symposium on High Performance Computing (HPC ’15) . Society for Computer Simulation International, 75–82. h/t_tp://dl.acm.org/citation.cfm?id=2872599.2872609

work page arXiv 2015

[4] [4]

Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D

Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Ma/t_thew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, and Hong Zhang. 2015.PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.6. Argonne National Laboratory

work page 2015

[5] [5]

Gang Bao, Guanghui Hu, and Di Liu. 2012. Numerical Solution of the Kohn-Sham Equation by Finite Element Methods with an Adaptive Mesh Redistribution Technique. Journal of Scienti/f_ic Computing55, 2 (#sep# 2012), 372–391. h/t_tps://doi.org/10.1007/s10915-012-9636-1

work page doi:10.1007/s10915-012-9636-1 2012

[6] [6]

/T_homas L. Beck. 2009.Real-Space and Multigrid Methods in Computational Chemistry . John Wiley & Sons, Inc., Hoboken, New Jersey, Chapter 5, 223–285. h/t_tps://doi.org/10.1002/9780470399545.ch5

work page doi:10.1002/9780470399545.ch5 2009

[7] [7]

Nicolas Bock and Ma/t_t Challacombe. 2013. An optimized sparse approximate matrix multiply for matrices with decay.SIAM Journal on Scienti/f_ic Computing 35, 1 (2013), C72–C98

work page 2013

[8] [8]

Nicolas Bock, Ma/t_t Challacombe, and Laxmikant V Kal´e. 2016. Solvers for O(N) Electronic Structure in the Strong Scaling Limit. SIAM Journal on Scienti/f_ic Computing38, 1 (2016), C1–C21

work page 2016

[9] [9]

Urban Borˇstnik, Joost VandeVondele, Val´ery Weber, and J¨urg Hu/t_ter. 2014. Sparse matrix multiplication: /T_he distributed block-compressed sparse row library. Parallel Comput. 40, 5-6 (2014), 47–58

work page 2014

[10] [10]

Bowler, Ian J

David R. Bowler, Ian J. Bush, and Michael J. Gillan. 2000. Practical methods for ab initio calculations on thousands of atoms. International Journal of /Q_uantum Chemistry77, 5 (2000), 831–842. Manuscript submi/t_ted to ACM 28 Davydov, D. and Kronbichler, M

work page 2000

[11] [11]

Bowler, Tsuyoshi Miyazaki, and Michael J

David R. Bowler, Tsuyoshi Miyazaki, and Michael J. Gillan. 2001. Parallel sparse matrix multiplication for linear scaling electronic structure calculations. Computer physics communications 137, 2 (2001), 255–273

work page 2001

[12] [12]

Aydin Buluc and John R Gilbert. 2008. Challenges and advances in parallel sparse matrix-matrix multiplication. In 37th International Conference on Parallel Processing. IEEE, 503–510

work page 2008

[13] [13]

Steven K Burger and Weitao Yang. 2008. Linear-scaling quantum calculations using non-orthogonal localized molecular orbitals. Journal of Physics: Condensed Ma/t_ter20, 29 (2008), 294209

work page 2008

[14] [14]

Eric J Bylaska, Michael Holst, and John H Weare. 2009. Adaptive Finite Element Method for Solving the Exact Kohn-Sham Equation of Density Functional /T_heory.Journal of Chemical /T_heory and Computation5, 4 (#apr# 2009), 937–948. h/t_tps://doi.org/10.1021/ct800350j

work page doi:10.1021/ct800350j 2009

[15] [15]

Ma/t_t Challacombe. 2000. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory.Computer physics communications 128, 1-2 (2000), 93–107

work page 2000

[16] [16]

Denis Davydov, Tymo/f_iy Gerasimov, Jean-Paul Pelteret, and Paul Steinmann. 2017. Convergence study of the h-adaptive PUM and the hp-adaptive FEM applied to eigenvalue problems in quantum mechanics. Advanced Modeling and Simulation in Engineering Sciences 4, 1 (12 Dec 2017), 7. h/t_tps://doi.org/10.1186/s40323-017-0093-0

work page doi:10.1186/s40323-017-0093-0 2017

[17] [17]

Denis Davydov, Timo Heister, Martin Kronbichler, and Paul Steinmann. 2018. Matrix-free locally adaptive /f_inite element solution of density- functional theory with nonorthogonal orbitals and multigrid preconditioning. Physica Status Solidi B: Basic Solid State Physics 255, 9 (2018). h/t_tps://doi.org/10.1002/pssb.201800069

work page doi:10.1002/pssb.201800069 2018

[18] [18]

Jun Fang, Xingyu Gao, and Aihui Zhou. 2012. A Kohn-Sham equation solver based on hexahedral /f_inite elements.J. Comput. Phys. 231, 8 (2012), 3166–3180

work page 2012

[19] [19]

Jean-Luc Fa/t_tebert and Jerry Bernholc. 2000. Towards grid-basedO(N) density-functional theory methods: Optimized nonorthogonal orbitals and multigrid acceleration. Phys. Rev. B 62 (Jul 2000), 1713–1722. Issue 3. h/t_tps://doi.org/10.1103/PhysRevB.62.1713

work page doi:10.1103/physrevb.62.1713 2000

[20] [20]

Jean-Luc Fa/t_tebert and Francois Gygi. 2004. Linear scaling /f_irst-principles molecular dynamics with controlled accuracy. Computer Physics Communications 162, 1 (2004), 24–36

work page 2004

[21] [21]

Jean-Luc Fa/t_tebert and Francois Gygi. 2006. Linear-scaling /f_irst-principles molecular dynamics with plane-waves accuracy.Phys. Rev. B 73 (Mar 2006), 115124. Issue 11. h/t_tps://doi.org/10.1103/PhysRevB.73.115124

work page doi:10.1103/physrevb.73.115124 2006

[22] [22]

Hornung, and Andrew M

Jean-Luc Fa/t_tebert, Richard D. Hornung, and Andrew M. Wissink. 2007. Finite element approach for density functional theory calculations on locally-re/f_ined meshes.J. Comput. Phys. 223, 2 (#may# 2007), 759–773. h/t_tps://doi.org/10.1016/j.jcp.2006.10.013

work page doi:10.1016/j.jcp.2006.10.013 2007

[23] [23]

Frey and D.J

J.T. Frey and D.J. Doren. 2011. TubeGen 3.4 (web-interface, h/t_tp://turin.nss.udel.edu/research/tubegenonline.html), University of Delaware, Newark, DE. (2011)

work page 2011

[24] [24]

Giulia Galli and Michele Parrinello. 1992. Large scale electronic structure calculations. Physical review le/t_ters69, 24 (1992), 3547

work page 1992

[25] [25]

Goringe, Eduardo Hern ´andez, Michael J

Chris M. Goringe, Eduardo Hern ´andez, Michael J. Gillan, and Ian J. Bush. 1997. Linear-scaling DFT-pseudopotential calculations on parallel computers. Computer physics communications 102, 1-3 (1997), 1–16

work page 1997

[26] [26]

Fred G Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical So/f_tware (TOMS)4, 3 (1978), 250–269

work page 1978

[27] [27]

Heroux, Roscoe A

Michael A. Heroux, Roscoe A. Bartle/t_t, Vicki E. Howle, Robert J. Hoekstra, Jonathan J. Hu, Tamara G. Kolda, Richard B. Lehoucq, Kevin R. Long, Roger P. Pawlowski, Eric T. Phipps, Andrew G. Salinger, Heidi K. /T_hornquist, Ray S. Tuminaro, James M. Willenbring, Alan Williams, and Kendall S. Stanley. 2005. An overview of the Trilinos project. ACM Trans. M...

work page doi:10.1145/1089014.1089021 2005

[28] [28]

Kikuji Hirose and Tomoya Ono. 2001. Direct minimization to generate electronic states with proper occupation numbers. Physical Review B 64, 8 (2001), 085105

work page 2001

[29] [29]

Kikuji Hirose, Tomoya Ono, Yoshitaka Fujimoto, and Shigeru Tsukamoto. 2005. First-Principles Claculations in Real-Space Formalism . Imperial College Press, 57 Shelton Street, Covent Garden, London WC2H 9HE

work page 2005

[30] [30]

Hohenberg and Walter Kohn

Pierre C. Hohenberg and Walter Kohn. 1964. Inhomogeneous electron gas. Physical Review 136, 3B (1964), B864–B871

work page 1964

[31] [31]

C ¸ataly¨urek, Srinivasan Parthasarathy, and P

Changwan Hong, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Jinsung Kim, S ¨ureyya Emre Kurt, Israt Nisa, Shivani Sabhlok, ¨Umit V. C ¸ataly¨urek, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Eﬃcient Sparse-matrix Multi-vector Product on GPUs. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing...

work page arXiv 2018

[32] [32]

Sohrab Ismail-Beigi and Tomas A. Arias. 1999. Locality of the Density Matrix in Metals, Semiconductors, and Insulators. Physical Review Le/t_ters82 (Mar 1999), 2127–2130. Issue 10. h/t_tps://doi.org/10.1103/PhysRevLe/t_t.82.2127

work page doi:10.1103/physrevle/t_t.82.2127 1999

[33] [33]

Jeremy Kepner and John Gilbert. 2011. Graph algorithms in the language of linear algebra . SIAM

work page 2011

[34] [34]

King-Smith and David Vanderbilt

R.D. King-Smith and David Vanderbilt. 1994. First-principles investigation of ferroelectricity in perovskite compounds. Physical Review B 49, 9 (1994), 5828

work page 1994

[35] [35]

Walter Kohn. 1959. Analytic Properties of Bloch Waves and Wannier Functions. Physical Review 115 (Aug 1959), 809–821. Issue 4. h/t_tps: //doi.org/10.1103/PhysRev.115.809

work page doi:10.1103/physrev.115.809 1959

[36] [36]

Walter Kohn and Lu Jeu Sham. 1965. Self-consistent equations including exchange and correlation eﬀects. Physical Review 140, 4A (1965), A1133–A1138. Manuscript submi/t_ted to ACM Algorithms and data structures for matrix-free /f_inite element operators with MPI-parallel sparse multi-vectors 29

work page 1965

[37] [37]

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R Bishop. 2014. A uni/f_ied sparse matrix data format for eﬃcient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM Journal on Scienti/f_ic Computing36, 5 (2014), C401–C423

work page 2014

[38] [38]

Moritz Kreutzer, Jonas /T_hies, Melven R¨ohrig-Z¨ollner, Andreas Pieper, Faisal Shahzad, Martin Galgon, Achim Basermann, Holger Fehske, Georg Hager, and Gerhard Wellein. 2017. GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems. International Journal of Parallel Programming 45, 5 (2017), 1046–1072

work page 2017

[39] [39]

Martin Kronbichler and Katharina Kormann. 2012. A generic interface for parallel cell-based /f_inite element operator application.Computers & Fluids 63 (jun 2012), 135–147. h/t_tps://doi.org/10.1016/j.comp/f_luid.2012.04.012

work page doi:10.1016/j.comp/f_luid.2012.04.012 2012

[40] [40]

Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin /f_inite element operators.ACM Trans. Math. So/f_tware(2019). In press

work page 2019

[41] [41]

Jianfeng Lu and Kyle /T_hicke. 2017. Orbital minimization method withl 1 regularization. J. Comput. Phys. 336 (2017), 87–103

work page 2017

[42] [42]

Francesco Mauri and Giulia Galli. 1994. Electronic-structure calculations and molecular-dynamics simulations with linear system-size scaling. Physical Review B 50, 7 (1994), 4316

work page 1994

[43] [43]

Michael McCourt, Barry Smith, and Hong Zhang. 2013. Eﬃcient sparse Matrix-Matrix products using colorings. SIAM J. Matrix Anal. Appl. (2013)

work page 2013

[44] [44]

Phani Motamarri, Sambit Das, Shiva Rudraraju, Krishnendu Ghosh, Denis Davydov, and Vikram Gavini. 2019. DFT-FE - A massively parallel adaptive /f_inite-element code for large-scale density functional theory calculations.Computer Physics Communications (2019)

work page 2019

[45] [45]

Phani Motamarri, Michael R Nowak, Kenneth Leiter, Jaroslaw Knap, and Vikram Gavini. 2012. Higher-order adaptive /f_inite-element methods for Kohn-Sham density functional theory. J. Comput. Phys. 253, 15 (#jun# 2012), 308–343

work page 2012

[46] [46]

Kurtis L. Nusbaum. 2011. Optimizing Tpetra’s sparse matrix-matrix multiplication routine . Technical Report. Technical report, SAND2011–6036, Sandia National Laboratories, Albuquerque, NM

work page 2011

[47] [47]

Pablo Ordej´on, David A Drabold, Richard M Martin, and Ma/t_thew P Grumbach. 1995. Linear system-size scaling methods for electronic-structure calculations. Physical Review B 51, 3 (1995), 1456

work page 1995

[48] [48]

Pask, Barry M

John E. Pask, Barry M. Klein, Philip A. Sterne, and Chingyao Y. Fong. 2001. Finite-element methods in electronic-structure theory. Computer Physics Communications 135, 1 (2001), 1–34

work page 2001

[49] [49]

Pask and Natarajan Sukumar

John E. Pask and Natarajan Sukumar. 2017. Partition of unity /f_inite element method for quantum mechanical materials calculations. Extreme Mechanics Le/t_ters11 (2017), 8–17. h/t_tps://doi.org/10.1016/j.eml.2016.11.003

work page doi:10.1016/j.eml.2016.11.003 2017

[50] [50]

Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G Pudov, Vadim O Pirogov, and Pradeep Dubey. 2015. Parallel eﬃcient sparse matrix-matrix multiplication on multicore platforms. In International Conference on High Performance Computing . Springer, 48–57

work page 2015

[51] [51]

Liang Peng, Feng Long Gu, and Weitao Yang. 2013. Eﬀective preconditioning for ab initio ground state energy minimization with non-orthogonal localized molecular orbitals. Physical Chemistry Chemical Physics 15, 37 (2013), 15518–15527

work page 2013

[52] [52]

L Ramdas Ram-Mohan. 2002. Finite Element and Boundary Element Applications in /Q_uantum Mechanics. Oxford University Press

work page 2002

[53] [53]

Emanuel H Rubensson, Elias Rudberg, and Pawe l Sa lek. 2006. Sparse matrix algebra for quantum modeling of large systems. In International Workshop on Applied Parallel Computing. Springer, 90–99

work page 2006

[54] [54]

Emanuel H Rubensson, Elias Rudberg, and Pawe l Sa lek. 2007. A hierarchic sparse matrix data structure for large-scale Hartree-Fock/Kohn-Sham calculations. Journal of computational chemistry 28, 16 (2007), 2531–2537

work page 2007

[55] [55]

Elias Rudberg, Emanuel H Rubensson, Pawe l Sa lek, and Anastasia Kruchinina. 2018. Ergo: An open-source program for linear-scaling electronic structure calculations. So/f_twareX7 (2018), 107–111

work page 2018

[56] [56]

Chandra Saravanan, Yihan Shao, Roi Baer, Philip N Ross, and Martin Head-Gordon. 2003. Sparse matrix multiplications for linear scaling electronic structure calculations in an atom-centered basis set using multiatom blocks. Journal of computational chemistry 24, 5 (2003), 618–622

work page 2003

[57] [57]

Volker Schauer and Christian Linder. 2015. /T_he reduced basis method in all-electron calculations with /f_inite elements.Advances in Computational Mathematics 41, 5 (2015), 1035–1047. h/t_tps://doi.org/10.1007/s10444-014-9374-z

work page doi:10.1007/s10444-014-9374-z 2015

[58] [58]

Jan Treibig, Georg Hager, and Gerhard Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel So/f_tware Tools and Tool Infrastructures. San Diego CA

work page 2010

[59] [59]

Eiji Tsuchida and Masaru Tsukada. 1996. Adaptive /f_inite-element method for electronic-structure calculations. Phys. Rev. B 54, 11 (sep 1996), 7602–7605. h/t_tps://doi.org/10.1103/physrevb.54.7602

work page doi:10.1103/physrevb.54.7602 1996

[60] [60]

Eiji Tsuchida and Masaru Tsukada. 1998. Large-Scale Electronic-Structure Calculations Based on the Adaptive Finite-Element Method. Journal of the Physical Society of Japan 67, 11 (1998), 3844–3858. h/t_tps://doi.org/10.1143/JPSJ.67.3844

work page doi:10.1143/jpsj.67.3844 1998

[61] [61]

Young, Eloy Romero, and Jose E

Toby D. Young, Eloy Romero, and Jose E. Roman. 2013. Parallel /f_inite element density functional computations exploiting grid re/f_inement and subspace recycling. In Computer Physics Communications. IEEE, 66–72. h/t_tps://doi.org/10.1109/DFT.2011.68

work page doi:10.1109/dft.2011.68 2013

[62] [62]

Dier Zhang, Lihua Shen, Aihui Zhou, and Xin-Gao Gong. 2008. Finite element method for solving Kohn–Sham equations based on self-adaptive tetrahedral mesh. Physics Le/t_ters A372, 30 (#jul# 2008), 5071–5076. h/t_tps://doi.org/10.1016/j.physleta.2008.05.075 Manuscript submi/t_ted to ACM

work page doi:10.1016/j.physleta.2008.05.075 2008