pith. sign in

arxiv: 2502.17994 · v3 · submitted 2025-02-25 · ⚛️ physics.ins-det · physics.data-an

Probabilistic Analysis of Event-Mode Experimental Data

Pith reviewed 2026-05-23 03:01 UTC · model grok-4.3

classification ⚛️ physics.ins-det physics.data-an
keywords neutron scatteringevent-mode dataprobabilistic analysishistogram-free methodssystematic errordata efficiencyx-ray scattering
0
0 comments X

The pith

Probabilistic modeling of raw neutron scattering events achieves greater efficiency and reduced systematic error without histogramming or least-squares fitting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that neutron and x-ray scattering data can be analyzed by applying a probabilistic model directly to the list of individual detection events. This replaces the conventional steps of histogramming counts into bins and then performing least-squares fits of multiple distribution components. A sympathetic reader would care because the traditional stack can introduce systematic errors from bin choices and requires more events to reach a given precision on parameters such as scattering contributions. If the claim holds, experiments would need fewer recorded events for the same accuracy while experiencing less bias from the analysis choices themselves.

Core claim

Analysis of neutron scattering event data using neither any numerical integration or histogramming steps, nor least squares fitting, yields greater efficiency (fewer data points for same parameter accuracy) and reduced impact of inherent systematic error.

What carries the argument

A probabilistic model applied directly to the raw event list, which recovers the scientific parameters by treating each detection as an independent sample drawn from the underlying distributions.

If this is right

  • Experiments reach target parameter precision with fewer recorded events.
  • Systematic biases tied to histogram bin width selection disappear.
  • Multiple overlapping scientific contributions are quantified without separate fitting stages.
  • No numerical integration is required to evaluate the likelihood of the observed events.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same direct-event approach could transfer to other counting detectors such as those used in x-ray or particle physics experiments.
  • Parameter estimation could run in near real time during data acquisition once enough events accumulate for the model to converge.
  • The method opens a route to joint analysis of event timing and position information that binning typically discards.

Load-bearing premise

The probabilistic model applied directly to the raw event list can recover the scientific parameters of interest without introducing new modeling biases equivalent to those already present in histogram-based methods.

What would settle it

Run both the direct probabilistic method and conventional histogram-plus-least-squares analysis on the same simulated raw event lists with known true parameter values, then check whether the probabilistic method reaches target accuracy at noticeably lower event counts or shows lower bias.

Figures

Figures reproduced from arXiv: 2502.17994 by Phillip M. Bentley, Thomas H. Rod.

Figure 1
Figure 1. Figure 1: Fits to random events, generated by a simple gaussian function, [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Boxplot of the extracted parameters from MLE and LSE from [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Standard deviations and means of the extracted parameters as a [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fits to random events, generated by a Cauchy distribution, using [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Boxplot of the extracted parameters from MLE and LSE from [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Simulation of 2D SANS detector map reduced to 1D event data [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: 200 samples of MCMC from the (M, κ) parameter space. There are 32 parallel random walkers. If we now switch to event mode, and use the MCMC-sampled Bayesian method, the samples of the (M, κ) parameter space are shown in figure 7. The true parameter value is shown as a solid horizontal line, and we can see that the MCMC samples are pretty close. On the other hand, the least squares estimate is not as good. … view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of the model PDF (solid line) with a histogram of the [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A least-squares fit of histogrammed data, compared to a MCMC [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: ARCS experimental data plotted on a Freedman-Diaconis optimal [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Least-squares fit to the ARCS experimental data. [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Least-squares fit to the ARCS experimental data (top) and an [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Verification that the parameter value drawn from the Markov [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: figure 3.1.2. Once again, the [PITH_FULL_IMAGE:figures/full_fig_p032_3_1.png] view at source ↗
Figure 14
Figure 14. Figure 14: KDE inspection of the elastic line using the full data set. The [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Comparison of the relative amplitude parameter of the elastic [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: KDE inspection of the first excitation line using the full data set. [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p036_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Comparison of the relative amplitude parameter of the first exci [PITH_FULL_IMAGE:figures/full_fig_p037_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: KDE inspection of the second excitation line using the full data [PITH_FULL_IMAGE:figures/full_fig_p037_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p038_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Comparison of the relative amplitude parameter of the second [PITH_FULL_IMAGE:figures/full_fig_p039_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: KDE inspection of the third excitation line using the full data [PITH_FULL_IMAGE:figures/full_fig_p039_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p040_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Comparison of the relative amplitude parameter of the third [PITH_FULL_IMAGE:figures/full_fig_p041_25.png] view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: figure 3.1.6. In all cases, the variation in the LSE parameters and the es [PITH_FULL_IMAGE:figures/full_fig_p041_3_1.png] view at source ↗
Figure 26
Figure 26. Figure 26: KDE inspection of the fourth excitation line using the full data [PITH_FULL_IMAGE:figures/full_fig_p042_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p043_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Comparison of the relative amplitude parameter of the fourth [PITH_FULL_IMAGE:figures/full_fig_p044_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p045_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Comparison of the relative amplitude parameter of the first back [PITH_FULL_IMAGE:figures/full_fig_p046_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Comparison of the extracted parameters from the MCMC method [PITH_FULL_IMAGE:figures/full_fig_p047_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Bayesian search for the location of the lost city of Atlantis! The [PITH_FULL_IMAGE:figures/full_fig_p055_32.png] view at source ↗
read the original abstract

Neutron and x-ray scattering experiments traditionally rely upon histogrammed data sets, which are analysed using least-squares curve fitting of multiple probability distribution components to quantify separately the various scientific contributions of interest. The main advantage to these methods is the relative ease of deployment due to their intuitive nature. Despite great popularity, these methods have known drawbacks, which can cause systematic errors and biases in some common scenarios in this field. Improvements over the base methods include dynamic optimisation of histogram bin width and the application of modern numerical optimisation methods that have greater stability, but, whilst reduced, the systematic effects carried by this stack nonetheless remain. In this study, we demonstrate analysis of neutron scattering event data using neither any numerical integration or histogramming steps, nor least squares fitting. The benefits of the new methodology are a greater efficiency (i.e. fewer data points required for the same parameter accuracy) and a reduced impact of inherent systematic error. The main drawbacks are a less intuitive analysis method and an increase in computation time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a probabilistic method for direct analysis of raw event lists from neutron and x-ray scattering experiments. It claims this approach avoids all histogramming, numerical integration, and least-squares fitting steps, thereby achieving greater statistical efficiency (fewer events needed for equivalent parameter precision) and lower systematic bias than conventional histogram-based least-squares analyses.

Significance. A validated method that correctly recovers scattering parameters from event data while eliminating both binning artifacts and quadrature error would be of clear practical value in the neutron/x-ray scattering community, where systematic biases from histogram construction are a known concern. The abstract, however, supplies neither a derivation of the likelihood, a demonstration that the required normalization integral can be avoided without restricting the model class, nor any numerical validation, so the significance cannot yet be assessed.

major comments (2)
  1. [Abstract] Abstract: the central claim that analysis proceeds with 'neither any numerical integration' is load-bearing for the efficiency and bias-reduction assertions, yet it is in direct tension with the standard inhomogeneous Poisson point-process likelihood for event data, whose log-likelihood contains the term −∫_domain λ(x) dx. The manuscript must specify either the restricted functional form of λ(x) that admits an analytic antiderivative for all parameter values or the approximation that replaces the integral; without this, the method cannot be evaluated against the skeptic's concern.
  2. [Abstract] Abstract: no derivation of the proposed likelihood, no error-propagation analysis, no comparison against synthetic or real data with known ground truth, and no efficiency metric (e.g., Fisher information per event versus binned least-squares) is supplied. The claims of 'greater efficiency' and 'reduced impact of inherent systematic error' therefore rest on assertion rather than demonstrated support.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful review and constructive comments. We agree that the abstract requires revision to explicitly address the technical points raised and better support the manuscript's claims. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that analysis proceeds with 'neither any numerical integration' is load-bearing for the efficiency and bias-reduction assertions, yet it is in direct tension with the standard inhomogeneous Poisson point-process likelihood for event data, whose log-likelihood contains the term −∫_domain λ(x) dx. The manuscript must specify either the restricted functional form of λ(x) that admits an analytic antiderivative for all parameter values or the approximation that replaces the integral; without this, the method cannot be evaluated against the skeptic's concern.

    Authors: We agree that the abstract should make this specification explicit. The manuscript employs a restricted parametric family for the intensity λ(x) whose integral over the domain has a closed-form antiderivative for every admissible parameter value, so that the Poisson point-process likelihood can be evaluated exactly without numerical quadrature. We will revise the abstract to state this functional form and its analytic normalization property. revision: yes

  2. Referee: [Abstract] Abstract: no derivation of the proposed likelihood, no error-propagation analysis, no comparison against synthetic or real data with known ground truth, and no efficiency metric (e.g., Fisher information per event versus binned least-squares) is supplied. The claims of 'greater efficiency' and 'reduced impact of inherent systematic error' therefore rest on assertion rather than demonstrated support.

    Authors: The full manuscript contains the likelihood derivation, the error-propagation analysis, comparisons against synthetic data with known ground truth, and quantitative efficiency metrics (including Fisher information per event). The abstract summarizes these results but, due to length limits, does not reproduce the supporting material. We will expand the abstract to reference these analyses and their principal findings so that the efficiency and bias-reduction claims are visibly grounded. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; method claims direct probabilistic treatment of events

full rationale

The provided abstract and description contain no equations, self-definitions, or fitted parameters renamed as predictions. The central claim is that a probabilistic model applied directly to the raw event list recovers parameters without histogramming, least-squares, or numerical integration. No load-bearing self-citation, uniqueness theorem, or ansatz is quoted. The derivation chain is not shown to reduce to its inputs by construction; the efficiency and bias-reduction claims are presented as empirical outcomes of the approach rather than tautological. This is the normal case of a self-contained proposal against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or new entities; ledger therefore empty.

pith-pipeline@v0.9.0 · 5696 in / 991 out tokens · 24740 ms · 2026-05-23T03:01:59.169349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    On the histogram as a density estimator: L 2 theory

    David Freedman and Persi Diaconis. On the histogram as a density estimator: L 2 theory. Zeitschrift f¨ ur Wahrscheinlichkeitstheorie und Ver- wandte Gebiete, 57:453–476, 1981

  2. [2]

    D. W. Hogg, J. Bovy, and D. Lang. Data analysis recpies: Fitting a model to data. arXiv, astro-ph.IM:1008.4686v1, 2010

  3. [3]

    Foreman-Mackey

    D. Foreman-Mackey. https://dfm.io/posts/mixture-models/, Dec 2014. 55

  4. [4]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

  5. [5]

    Clauset, C

    A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661–703, 2009

  6. [6]

    P. M. Bentley. Error rates in SARS-CoV-2 testing examined with Bayes’ theorem. Heliyon, 7(4):E06905, 2021

  7. [7]

    F. Kafka. The Trial (Der Prozess) . Verlag Die Schmiede, Berlin, 1924. 56