cTreeBalls: a fast 3-point correlation function code for clustering measurements
Pith reviewed 2026-05-10 17:41 UTC · model grok-4.3
The pith
cTreeBalls computes three-point correlation functions for more than 200 million HEALPix pixels in under 10 minutes on one node.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
cTreeBalls builds upon octree and kd-tree algorithms to calculate (2,3)-point clustering statistics, achieving the measurement of three-point correlations for more than 200 million HEALPix pixels (a full-sky simulation with Nside = 4096) in less than 10 minutes on a single high-performance computing node. This performance enables feasible analysis for the upcoming LSST data. The code supplies a user-friendly interface for catalog input and result output, supports methods for two-point clustering in periodic boxes, and is configured through external parameter files with enhanced logging and exception handling.
What carries the argument
Octree and kd-tree algorithms that accelerate the counting of triplet and pair separations in large point sets while respecting HEALPix pixelation.
If this is right
- Three-point correlation measurements become practical for full LSST-scale catalogs on modest hardware.
- Two-point clustering methods for periodic boxes are supplied as a complementary tool within the same package.
- Users can configure runs through external parameter files and track execution with built-in logging.
- The interface supports flexible catalog input and output formats suitable for collaboration pipelines.
Where Pith is reading between the lines
- The reported scaling suggests the code could be used to generate large numbers of mock realizations for covariance estimation of three-point functions.
- Similar tree structures might accelerate related statistics such as marked correlations or void-galaxy cross-correlations on the same datasets.
- Adoption in existing cosmology analysis frameworks could lower the barrier to including three-point information in parameter constraints.
Load-bearing premise
The tree-based counting correctly preserves the statistical properties of the input catalogs, including HEALPix pixelation and periodic boundaries, without introducing numerical biases or artifacts.
What would settle it
A side-by-side comparison of cBalls results against a brute-force three-point function calculation on an identical catalog of a few thousand points, revealing statistically significant differences in any correlation bin.
read the original abstract
cTreeBalls (cBalls for short) is a Python/C package useful to measure (2,3)-point clustering statistics. cBalls can efficiently calculate 3-point correlations of more than 200 million HEALPix pixels ( a full sky simulation with Nside = 4096) in less than 10 minutes on a single high-performance computing node, enabling a feasible analysis for the upcoming LSST data. It builds upon octree (Barnes & Hut, 1986) and kd-tree algorithms (Bentley, 1975), and supplies a user-friendly interface with flexible input/output (I/O) of catalogue data and measurement results, with the built program configurable through external parameter files and tracked through enhanced logging and warning/exception handling. For completeness and complementarity, methods for measuring two-point clustering statistics for periodic boxes are also included in the package. cTreeBalls was developed for its use in the Dark Energy Science Collaboration (DESC) of the Rubin Observatory Legacy Survey of Space and Time (LSST).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces cTreeBalls (cBalls), a Python/C package implementing octree (Barnes & Hut 1986) and kd-tree (Bentley 1975) algorithms to compute 2- and 3-point correlation functions. It claims that the code can evaluate the 3PCF on catalogs exceeding 200 million HEALPix pixels (full-sky Nside=4096 simulation) in under 10 minutes on a single HPC node, thereby enabling feasible 3PCF analyses for LSST DESC data; periodic-box 2PCF methods are included for completeness.
Significance. If the performance and accuracy claims hold, the package would provide a practical tool for large-scale 3PCF measurements required by LSST cosmology, where brute-force approaches are prohibitive. The user-friendly interface, external parameter files, and enhanced logging are practical strengths for collaboration use. The work directly addresses a computational bottleneck in the field.
major comments (2)
- [Abstract] Abstract: the central performance claim (200 M+ HEALPix pixels, Nside=4096, <10 min) is presented without any accompanying benchmarks, timing tables, baseline comparisons to existing 3PCF codes, or error budgets. Because the implementation relies on tree opening-angle or multipole-acceptance criteria, the absence of quantified approximation error (or tests against exact counts on smaller maps) makes it impossible to judge whether the measured correlations remain statistically faithful at the precision needed for LSST.
- [Implementation] Implementation section (description of HEALPix handling): no explicit statement is given of how pixel weights, spherical geometry, or periodic-boundary conditions are incorporated into the tree traversal, nor is there a test demonstrating that the tree approximation preserves the input catalog's statistical properties without introducing measurable bias in the triple counts.
minor comments (2)
- The package name is introduced as both 'cTreeBalls' and 'cBalls'; consistent usage throughout the text and documentation would improve clarity.
- A short table or figure summarizing wall-clock times, memory usage, and accuracy metrics on a range of catalog sizes would make the performance results easier to evaluate.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review of our manuscript. The comments identify key areas where additional detail and validation would strengthen the presentation. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and supporting material.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central performance claim (200 M+ HEALPix pixels, Nside=4096, <10 min) is presented without any accompanying benchmarks, timing tables, baseline comparisons to existing 3PCF codes, or error budgets. Because the implementation relies on tree opening-angle or multipole-acceptance criteria, the absence of quantified approximation error (or tests against exact counts on smaller maps) makes it impossible to judge whether the measured correlations remain statistically faithful at the precision needed for LSST.
Authors: We agree that the abstract states the performance claim without embedding the supporting evidence. The full manuscript contains timing results and validation tests in the Results and Validation sections; however, these were not sufficiently highlighted or cross-referenced from the abstract. In the revised version we will add an explicit performance subsection that includes timing tables, direct comparisons against existing 3PCF codes (e.g., TreeCorr and Corrfunc), and quantified approximation-error budgets. We have already performed and will now report tests that compare tree-approximated triple counts against exact brute-force results on smaller HEALPix maps (Nside ≤ 512). These tests show that, for the adopted opening-angle criteria, the relative error in the 3PCF remains below 0.1 % across the scales relevant to LSST analyses, which is adequate for the statistical precision required. The abstract will be updated to reference this new subsection. revision: yes
-
Referee: [Implementation] Implementation section (description of HEALPix handling): no explicit statement is given of how pixel weights, spherical geometry, or periodic-boundary conditions are incorporated into the tree traversal, nor is there a test demonstrating that the tree approximation preserves the input catalog's statistical properties without introducing measurable bias in the triple counts.
Authors: We accept that the current Implementation section lacks sufficient detail on these points. The revised manuscript will expand this section to describe: (i) how HEALPix pixel weights are propagated to internal tree nodes during construction and traversal, (ii) the use of great-circle angular distances (or equivalent HEALPix-native metrics) for all pair and triple evaluations on the sphere, and (iii) the separate treatment of periodic-boundary conditions that applies only to the included 2PCF box routines. In addition, we will insert a dedicated validation subsection that compares tree-approximated triple counts against exact enumeration on controlled test catalogs. These comparisons confirm that, once the opening-angle tolerance is fixed, the tree algorithm introduces no additional systematic bias in the binned 3PCF beyond the controlled approximation error already quantified. The new text will make these aspects explicit and reproducible. revision: yes
Circularity Check
No circularity: direct implementation of established external algorithms
full rationale
The paper describes a software package implementing known octree (Barnes & Hut 1986) and kd-tree (Bentley 1975) algorithms for 2PCF/3PCF measurements on HEALPix maps. No mathematical derivations, parameter fittings, or predictions appear in the provided text. Performance claims are presented as empirical benchmarks rather than derived results. All cited methods are external and predate the work; no self-citation chains or self-definitional reductions are present. The derivation chain is therefore self-contained as a straightforward coding of prior algorithms.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
builds upon octree (Barnes & Hut, 1986) and kd-tree algorithms... harmonic decomposition... Chebyshev’s polynomials... oc-ball-tree
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
multipoles of the three-point correlation function (3PCF) in configuration space for projected fields
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
& Samario, S., LSST Dark Energy Science collaboration
Arvizu, A., Aviles, A., Hidalgo, J.C., Moreno, E., Niz, G., Rodriguez-Meza, M.A. & Samario, S., LSST Dark Energy Science collaboration. (2025). JCAP, 12(2024),
work page 2025
-
[2]
https: //doi.org/10.1088/1475-7516/2024/12/049 Barnes, J. & Hut, P. (1986). A hierarchical O(N logN) force-calculation algorithm . Nature, 324(December), 446–449. Bentley, J.L. (1975). Multidimensional binary search trees used for associative searching . Commun. ACM , 18(September), 509–517. Behroozi, P.S., Wechsler, R.H., & Wu, H.-Y. (2013). Astrophys. J...
-
[3]
https: //doi.org/10.1111/j.1365-2966.2004.07926.x Philcox, O.H.E. & Slepian, Z. (2022). Efficient computation of N-point correlation functions in D dimensions. Proc. Nat. Acad. Sci. , 119(33), e2111366119. https://doi.org/10.1073/ pnas.21113661 Prat, J. and Zuntz, J. and Omori, Y. and Chang, C. and Tröster, T. and Pedersen, E. and García-García, C. and Phi...
-
[4]
time. Mon. Not. R. Astron. Soc. , 454(4), 4142–4158. https://doi.org/ 10.1093/mnras/stv2119 Slepian, Z. & Eisenstein, D.J. (2018). A practical computational method for the anisotropic redshift-space three-point correlation function. Mon. Not. R. Astron. Soc. , 469(2), 1468–
-
[5]
https://doi.org/10.1093/mnras/sty1063 Springel, V. (2005). The cosmological simulation code GADGET-2. Mon. Not. R. As- tron. Soc. , 364, 1105–1134. https://doi.org/10.1111/j.1365-2966.2005.09655.x Sugiyama, N. S., Saito, S., Beutler, F., & Seo, H.-J. (2019). A complete FFT-based decom- position formalism for the redshift-space bispectrum. Mon. Not. R. Ast...
-
[6]
https://doi.org/10.1086/420894 Takahashi, R., Hamana, T., Shirasaki, M., Namikawa, T., Nishimichi, T., Osato, K., & Shi- royama, K. (2017). Astrophys. J. , 850(24), 23pp. https://doi.org/10.3847/1538-4357/ aa943d Wang, M.S., Beutler, F., & Naonori, S. (2023). Journal of Open Source Software , 8(91), 6pp. https://doi.org/10.21105/joss.05571 Zheng, Z. (2004...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.