pith. sign in

arxiv: 2503.10335 · v5 · submitted 2025-03-13 · ⚛️ physics.chem-ph

A Scalable Diagonalization Framework for Tensor-Product Bitstring Selected Configuration Interaction

Pith reviewed 2026-05-23 00:26 UTC · model grok-4.3

classification ⚛️ physics.chem-ph
keywords selected configuration interactiontensor product bitstringdistributed computingstrongly correlated systemsfull configuration interactionparallel eigensolverquantum chemistry methods
0
0 comments X

The pith

A tensor-product bitstring representation enables fully distributed diagonalization of selected configuration interaction spaces up to trillions of determinants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops tensor-product bitstring selected configuration interaction (TBSCI) to overcome memory bottlenecks in SCI methods by distributing the CI vector across processes. Determinants are organized in a TPB structure from selected alpha- and beta-bitstrings, paired with a bitstring-based Hamiltonian algorithm and MPI strategies. Benchmarks show the eigensolver scales to 2.6 trillion determinants on 54,000 nodes of Fugaku. The TPB approach also yields compact wavefunctions that approach full configuration interaction accuracy with only a small fraction of determinants when bitstrings are ranked by collective weight.

Core claim

TBSCI organizes determinants via a tensor-product bitstring structure from alpha- and beta-bitstrings, enabling a distributed eigensolver that performs Hamiltonian evaluation without full vector replication and scales to 2.6 trillion determinants using 54,000 nodes and 2.5 million cores, while collective-weight selection produces TPB wavefunctions close to the FCI limit.

What carries the argument

The tensor-product bitstring (TPB) representation, which constructs the determinant space from independent selections of alpha- and beta-bitstrings for distributed computation and compact wavefunction approximation.

Load-bearing premise

The organization of determinants into a tensor product of selected alpha- and beta-bitstrings preserves the essential electron correlations present in the original selected configuration interaction wavefunction.

What would settle it

Running a full configuration interaction calculation on a small system and comparing its energy and wavefunction overlap to a TBSCI wavefunction built from collective-weight selected bitstrings to check for significant deviation.

read the original abstract

Selected configuration interaction (SCI) methods are effective for treating strongly correlated electronic systems, yet their scalability has long been limited by implementations that replicate the configuration interaction (CI) vector across processes, leading to severe memory bottlenecks. Here, we present a fully distributed diagonalization framework tailored for extremely large selected determinant spaces, directly addressing this major scalability bottleneck of modern SCI methods. The method is grounded in a tensor-product bitstring (TPB) representation, in which determinants are organized through a TPB structure constructed from selected alpha- and beta-bitstrings, and is referred to as tensor-product bitstring SCI (TBSCI). An efficient TBSCI eigensolver is developed based on a novel bitstring-based Hamiltonian evaluation algorithm together with a suite of MPI communication strategies designed to improve parallel efficiency. Large-scale full configuration interaction (FCI) benchmarks, employed as communication-intensive stress tests, demonstrate that the implemented TBSCI eigensolver continues to reduce the wall time for distributed diagonalization of 2.6 trillion determinants, reaching 54,000 nodes (more than 2.5 million cores) on supercomputer Fugaku. Beyond scalability, we investigate the structural compactness of the TPB representation and show that selecting alpha- and beta-bitstrings according to their collective weights in a reference SCI wavefunction yields TPB-based wavefunctions approaching the FCI limit while using only a small fraction of determinants. These results establish TBSCI as a scalable SCI methodology and provide evidence for the intrinsic compactness of the TPB representation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces tensor-product bitstring selected configuration interaction (TBSCI), a distributed diagonalization framework for SCI that organizes determinants via a tensor-product structure of selected alpha- and beta-bitstrings. It presents a bitstring-based Hamiltonian algorithm and MPI strategies, with large-scale benchmarks showing wall-time reduction for diagonalizing 2.6 trillion determinants on up to 54,000 nodes (2.5M+ cores) of Fugaku. It further claims that ranking alpha/beta bitstrings by collective weights from a reference SCI vector produces compact TPB spaces whose variational energies approach the FCI limit for small molecules.

Significance. If the TPB representation preserves essential correlations, the demonstrated scaling to trillions of determinants on exascale hardware would represent a substantial advance for SCI methods on strongly correlated systems, addressing memory bottlenecks in replicated CI vectors. The compactness result, if robust, could further reduce computational cost by orders of magnitude.

major comments (2)
  1. [§4] §4 (and associated figures): The claim that collective-weight ranking of marginal alpha- and beta-bitstrings produces TPB wavefunctions approaching the FCI limit rests on results for a handful of small molecules, but provides no quantitative energy errors relative to FCI, no error bars, and no direct comparison of variational energies against standard (non-tensor-product) SCI at fixed determinant count. This leaves open whether determinants important only through specific alpha-beta entanglement are systematically omitted.
  2. [Abstract, §3] Abstract and §3: The central scalability claim for the TBSCI eigensolver is supported by timing benchmarks, yet the manuscript does not report the specific molecular systems, basis sets, or reference SCI vectors used to construct the TPB structure in the 2.6-trillion-determinant runs, making it impossible to assess whether the reported wall-time improvements generalize beyond the tested cases or depend on particular selection details.
minor comments (2)
  1. [§4] Figure captions in §4 should explicitly state the molecules, basis sets, and determinant counts used for the compactness comparisons to allow direct reproduction.
  2. Notation for the collective-weight ranking procedure could be formalized with an equation defining the marginal weights to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our results. We address each major comment below and will revise the manuscript accordingly where appropriate.

read point-by-point responses
  1. Referee: [§4] §4 (and associated figures): The claim that collective-weight ranking of marginal alpha- and beta-bitstrings produces TPB wavefunctions approaching the FCI limit rests on results for a handful of small molecules, but provides no quantitative energy errors relative to FCI, no error bars, and no direct comparison of variational energies against standard (non-tensor-product) SCI at fixed determinant count. This leaves open whether determinants important only through specific alpha-beta entanglement are systematically omitted.

    Authors: We agree that quantitative energy errors and direct comparisons to standard SCI would make the compactness claim more rigorous. The figures in §4 illustrate convergence toward FCI energies with increasing numbers of alpha/beta bitstrings, but we will add a table reporting explicit energy differences to FCI for the TPB spaces along with comparisons to standard SCI at matched determinant counts. This addition will address whether alpha-beta entanglement effects are captured and confirm that the collective-weight ranking does not systematically omit key determinants. Error bars are not relevant for these deterministic variational calculations. revision: yes

  2. Referee: [Abstract, §3] Abstract and §3: The central scalability claim for the TBSCI eigensolver is supported by timing benchmarks, yet the manuscript does not report the specific molecular systems, basis sets, or reference SCI vectors used to construct the TPB structure in the 2.6-trillion-determinant runs, making it impossible to assess whether the reported wall-time improvements generalize beyond the tested cases or depend on particular selection details.

    Authors: The 2.6-trillion-determinant benchmarks are full CI spaces used as communication stress tests (not selected spaces derived from a reference SCI vector), so no reference SCI vector is involved. We acknowledge that the manuscript should explicitly identify the molecular systems and basis sets employed for these runs. In the revision we will add this information to §3 (and note it briefly in the abstract), clarifying that the TPB structure consists of the complete set of alpha and beta bitstrings for the chosen systems to isolate the parallel performance of the eigensolver. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external benchmarks and empirical comparisons

full rationale

The paper's core claims concern implementation scalability of a distributed TBSCI eigensolver (demonstrated via wall-time reductions on Fugaku hardware for 2.6T determinants) and empirical compactness of the TPB representation (shown by variational energies approaching FCI for small molecules when alpha/beta bitstrings are ranked by collective weights from a reference SCI vector). These rest on hardware benchmarks and direct energy comparisons rather than any derivation that reduces a result to its inputs by construction. No equations equate a 'prediction' to a fitted parameter, no self-citation chain bears the central load, and the selection procedure is presented as a practical heuristic validated externally, not as a self-defining identity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The method rests on standard linear-algebra and MPI primitives plus the assumption that bitstring operations can be performed without loss of numerical stability; no new physical entities or fitted constants are introduced in the abstract.

axioms (1)
  • standard math Standard MPI collective operations and bitstring arithmetic are numerically stable at the reported scale.
    Invoked implicitly when claiming wall-time reduction on 2.5 million cores.
invented entities (1)
  • Tensor-product bitstring (TPB) structure no independent evidence
    purpose: Organize selected determinants to enable distributed storage and Hamiltonian evaluation without full CI vector replication.
    New representational choice introduced to overcome memory bottleneck; no independent experimental evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5813 in / 1334 out tokens · 64576 ms · 2026-05-23T00:26:43.823995+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Absorbing Many-Body Correlations into Core-Optimized Orbitals

    quant-ph 2026-05 unverdicted novelty 6.0

    COO co-optimizes orbitals with TrimCI to absorb many-body correlations into the basis, cutting determinant count by orders of magnitude for iron-sulfur clusters versus localized bases or DMRG.