pith. machine review for the scientific record. sign in

arxiv: 2512.24290 · v2 · submitted 2025-12-30 · ⚛️ physics.ins-det · cs.LG· physics.data-an

Recognition: 2 theorem links

· Lean Theorem

Fast reconstruction-based ROI triggering via anomaly detection in the CYGNO optical TPC

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:03 UTC · model grok-4.3

classification ⚛️ physics.ins-det cs.LGphysics.data-an
keywords anomaly detectionautoencoderoptical TPCROI triggeringdata reductionCYGNOunsupervised learningreal-time processing
0
0 comments X

The pith

A pedestal-trained convolutional autoencoder detects particle signals in optical TPC images through reconstruction residuals for fast ROI triggering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an unsupervised method to extract regions of interest from large optical TPC images using a convolutional autoencoder. The autoencoder is trained only on pedestal images containing detector noise, learning to reconstruct the background without any labels or simulations. When applied to data frames, high reconstruction residuals mark the locations of particle tracks and interactions. These regions are then clustered into compact ROIs. Tests on real CYGNO prototype data show the best setup keeps 93 percent of the signal intensity while removing 97.8 percent of the image area in roughly 25 milliseconds per frame on a consumer GPU.

Core claim

The authors claim that reconstruction-based anomaly detection with a pedestal-trained convolutional autoencoder enables efficient ROI triggering in optical TPCs. By comparing residuals to identify particle-induced structures and applying thresholding and clustering, compact ROIs are generated from raw frames. Using real data, one configuration retains 93.0 +/- 0.2 percent of reconstructed signal intensity while discarding 97.8 +/- 0.1 percent of the image area, with 25 ms inference time on consumer GPU hardware. The study shows that the training objective choice is key to performance and that this approach offers a transparent, detector-agnostic baseline for online data reduction.

What carries the argument

Convolutional autoencoder trained exclusively on pedestal images, using reconstruction residuals to detect anomalies corresponding to particle events.

If this is right

  • Fast ROI extraction supports real-time data selection in optical TPC experiments.
  • Significant reduction in data volume preserves signal for rare event analysis.
  • Training objective design is critical for the success of reconstruction-based anomaly detection.
  • The method operates without requiring labeled data or detailed detector simulations.
  • Inference runs at approximately 25 ms per frame on consumer GPUs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could scale to larger optical TPC arrays used in dark matter searches.
  • Similar autoencoder methods might apply to other high-resolution imaging detectors in particle physics.
  • Validation against fully simulated datasets would help measure any signal loss precisely.

Load-bearing premise

That residuals from reconstructing pedestal images with the autoencoder will accurately and completely identify particle-induced features without substantial false positives or missed signals.

What would settle it

A dataset with independently verified particle events where the extracted ROIs fail to capture most of the signal intensity or retain more than a few percent of the background area.

Figures

Figures reproduced from arXiv: 2512.24290 by A. Croce, A. Messina, A. Russo, C. Capoccia, C. M. B. Monteiro, D. Fiorina, D. J. G. Marques, D. Pierluigi, D. Pinci, D. Tozzi, E. Baracchini, E. Di Marco, E. Kemp, E. Paoletti, F. D. Amaro, F. Iacoangeli, F. Petrucci, F. Renga, G. Cavoto, G. Dho, G. D'Imperio, G. Maccarrone, G. Mazzitelli, G. M. Oppedisano, G. Saviano, H. P. Lima Jr., I. A. Costa, I. F. Pains, J. M. F. dos Santos, L. Benussi, L. G. M. de Carvalho, M. Caponero, M. D'Astolfo, N. J. Spooner, P. A. O. C. Silva, P. Meloni, R. A. Nobrega, R. Antonietti, R. D. P. Mano, R. Tesauro, S. Piacentini, S. Tomassini, V. Monno, Z. Islam.

Figure 1
Figure 1. Figure 1: Schematic representation of the convolutional autoencoder architecture. The pedestal dataset used for training contains 105 frames. Despite its modest size, it is sufficient for learning the highly homogeneous noise morphology of the detector, illustrating the low calibration and data requirements of this approach. No explicit hot-pixel masking is applied: persistent camera hot pixels appear with a fixed p… view at source ↗
Figure 2
Figure 2. Figure 2: Example of synthetic perturbations injected during training of the refined autoencoder. From left to right: (a) clean pedestal frame used as reconstruction target; (b) corrupted input obtained by injecting synthetic curved strokes and Gaussian blobs with varying amplitude; (c) absolute difference between input and target (shown for visualization); (d) binary injection mask m used to up-weight the reconstru… view at source ↗
Figure 3
Figure 3. Figure 3: Representative Regions of Interest (ROIs) returned by the anomaly-detection framework. Each row shows: (a) the fiducialized camera image; (b) the anomaly map, where track-like structures appear as localized high-residual regions; (c) the final ROI mask after spatial aggregation. The ROIs reliably enclose particle-induced structures while excluding noise-dominated background regions. event sample, using an … view at source ↗
Figure 4
Figure 4. Figure 4: Trade-off between mean signal-intensity coverage and mean area cut for the three anomaly-scoring approaches, obtained by sweeping the residual threshold τ. All methods share the same ROI-extraction pipeline and are evaluated on the same event sample. Curves closer to the top-right indicate stronger compression at fixed signal retention. 4.4 Dependence on Event Energy [PITH_FULL_IMAGE:figures/full_fig_p009… view at source ↗
Figure 5
Figure 5. Figure 5: Signal-intensity coverage as a function of reconstructed event energy for the refined-training autoencoder. The method maintains high coverage across the full energy range. A small number of outliers at very low coverage are discussed separately in Section 4.5. the training objective is carefully defined. Trained exclusively on pedestal frames, the model learns the detector’s characteristic noise morpholog… view at source ↗
Figure 6
Figure 6. Figure 6: Representative examples of events with near-zero signal-intensity coverage. Each panel shows the original camera image and the corresponding ROI mask produced by the refined-training autoencoder, with pixels assigned to the event by the reference reconstruction overlaid. No clear track-like structures are visible in the raw images, indicating that these cases arise from reconstruction artifacts rather than… view at source ↗
read the original abstract

Optical-readout Time Projection Chambers (TPCs) produce megapixel-scale images whose fine-grained topological information is essential for rare-event searches, but whose size challenges real-time data selection. We present an unsupervised, reconstruction-based anomaly-detection strategy for fast Region-of-Interest (ROI) extraction that operates directly on minimally processed camera frames. A convolutional autoencoder trained exclusively on pedestal images learns the detector noise morphology without labels, simulation, or fine-grained calibration. Applied to standard data-taking frames, localized reconstruction residuals identify particle-induced structures, from which compact ROIs are extracted via thresholding and spatial clustering. Using real data from the CYGNO optical TPC prototype, we compare two pedestal-trained autoencoder configurations that differ only in their training objective, enabling a controlled study of its impact. The best configuration retains (93.0 +/- 0.2)% of reconstructed signal intensity while discarding (97.8 +/- 0.1)% of the image area, with an inference time of approximately 25 ms per frame on a consumer GPU. The results demonstrate that careful design of the training objective is critical for effective reconstruction-based anomaly detection and that pedestal-trained autoencoders provide a transparent and detector-agnostic baseline for online data reduction in optical TPCs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an unsupervised, reconstruction-based anomaly detection strategy for fast ROI extraction in optical TPCs. A convolutional autoencoder is trained exclusively on pedestal images to learn detector noise morphology; localized residuals on standard data frames are thresholded and clustered to define compact ROIs. On real CYGNO prototype data, two training-objective variants are compared, with the best achieving (93.0 ± 0.2)% retention of reconstructed signal intensity while discarding (97.8 ± 0.1)% of the image area at ~25 ms inference per frame on a consumer GPU. The work positions the approach as a transparent, simulation-free baseline for online data reduction.

Significance. If the performance metrics are robustly validated, the method offers a practical, detector-agnostic route to real-time data reduction for megapixel-scale optical TPC images in rare-event searches. The controlled comparison of training objectives demonstrates that objective design materially affects residual quality, providing transferable guidance for anomaly-detection pipelines in instrumentation. The low-latency inference on consumer hardware is a concrete operational advantage.

major comments (2)
  1. [Results section] Results section (performance metrics): The headline claim of (93.0 ± 0.2)% signal retention is computed from reconstructed signal intensity on the same camera frames used to generate the ROIs. The manuscript does not state whether the downstream reconstruction operates on the full frame or is restricted to the autoencoder-selected ROIs; without this clarification or an independent ground-truth measure (e.g., external coincidence or injected charge), the retention figure risks partial circular dependence on the very reconstruction the method aims to accelerate.
  2. [Methods and validation] Methods and validation: No baseline comparisons (e.g., simple intensity thresholding or alternative unsupervised detectors) and no simulation cross-checks of the residual-to-signal mapping are reported, despite the abstract's quantitative claims on real data. The absence of these controls leaves the central performance numbers only moderately supported and makes it difficult to isolate the contribution of the autoencoder design.
minor comments (2)
  1. The inference time is stated as 'approximately 25 ms'; reporting the exact mean, standard deviation, and hardware configuration (GPU model, batch size) would improve reproducibility.
  2. The spatial clustering step used to form compact ROIs from thresholded residuals is described only at high level; specifying the algorithm, connectivity criterion, and minimum-size cut would aid implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and have revised the manuscript to improve clarity and add requested controls where feasible.

read point-by-point responses
  1. Referee: [Results section] Results section (performance metrics): The headline claim of (93.0 ± 0.2)% signal retention is computed from reconstructed signal intensity on the same camera frames used to generate the ROIs. The manuscript does not state whether the downstream reconstruction operates on the full frame or is restricted to the autoencoder-selected ROIs; without this clarification or an independent ground-truth measure (e.g., external coincidence or injected charge), the retention figure risks partial circular dependence on the very reconstruction the method aims to accelerate.

    Authors: We thank the referee for highlighting this potential ambiguity. The signal retention is computed by first applying the standard reconstruction algorithm to the entire camera frame to determine the total reconstructed signal intensity, then calculating the fraction of that intensity captured inside the ROIs defined from autoencoder residuals. The ROI selection depends only on reconstruction residuals and is independent of the signal-intensity measurement. We will revise the Results section to state this procedure explicitly. An independent ground-truth measure (e.g., external coincidence) is not available in the existing prototype dataset and would require new hardware and data-taking runs outside the present scope. revision: partial

  2. Referee: [Methods and validation] Methods and validation: No baseline comparisons (e.g., simple intensity thresholding or alternative unsupervised detectors) and no simulation cross-checks of the residual-to-signal mapping are reported, despite the abstract's quantitative claims on real data. The absence of these controls leaves the central performance numbers only moderately supported and makes it difficult to isolate the contribution of the autoencoder design.

    Authors: We agree that explicit baselines strengthen the claims. In the revised manuscript we will add a direct comparison with simple intensity thresholding, showing that the autoencoder yields higher signal retention at comparable area reduction. Because the method is deliberately simulation-free, we have not performed simulation cross-checks; instead, the controlled comparison of the two training objectives on real pedestal and data frames already isolates the effect of objective choice on residual quality. We will expand the Methods section to articulate this validation strategy. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical metrics measured on held-out experimental frames

full rationale

The paper trains a convolutional autoencoder exclusively on pedestal images and applies it to standard data-taking frames to extract ROIs via residuals, thresholding, and clustering. The headline performance figures (93.0 % signal retention, 97.8 % area discard) are stated as direct measurements of reconstructed signal intensity on real CYGNO prototype data, with no equations, self-citations, or definitions shown that reduce these quantities to fitted parameters or inputs defined by the same method. The derivation chain consists of an unsupervised training step followed by independent evaluation on separate frames; no load-bearing step collapses by construction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that an autoencoder trained only on pedestal images will produce residuals that correspond to particle signals; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption An autoencoder trained exclusively on pedestal images learns detector noise morphology sufficiently well to flag particle-induced structures via reconstruction residuals.
    Invoked by the training strategy and the claim that residuals identify particle structures.

pith-pipeline@v0.9.0 · 5783 in / 1292 out tokens · 63809 ms · 2026-05-16T19:03:29.734303+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    The CYGNO Experiment.Instruments, 6(1):6, 2022

    Fernando Domingues Amaro et al. The CYGNO Experiment.Instruments, 6(1):6, 2022

  2. [2]

    Bayesian network 3D event reconstruction in the Cygno optical TPC for dark matter direct detection.Eur

    Fernando Domingues Amaro et al. Bayesian network 3D event reconstruction in the Cygno optical TPC for dark matter direct detection.Eur. Phys. J. C, 85(11):1261, 2025

  3. [3]

    The CYGNO experiment, a directional detector for direct Dark Matter searches.Nucl

    Fernando Domingues Amaro et al. The CYGNO experiment, a directional detector for direct Dark Matter searches.Nucl. Instrum. Meth. A, 1054:168325, 2023

  4. [4]

    Directional idbscan to detect cosmic-ray tracks for the cygno experiment.Measurement Science and Technology, 34(12):125024, sep 2023

    F D Amaro, R Antonietti, E Baracchini, L Benussi, S Bianco, F Borra, C Capoccia, M Caponero, D S Cardoso, G Cavoto, I A Costa, G D’Imperio, E Dan` e, G Dho, F Di Giambattista, E Di Marco, F Iacoangeli, E Kemp, H P Lima J´ unior, G S P Lopes, G Maccarrone, R D P Mano, R R Marcelo Gregorio, D J G Marques, G Mazzitelli, A G McLean, P Meloni, A Messina, C M B...

  5. [5]

    Variational autoencoders for anomalous jet tagging.Phys

    Taoli Cheng, Jean-Fran¸ cois Arguin, Julien Leissner-Martin, Jacinthe Pilette, and Tobias Golling. Variational autoencoders for anomalous jet tagging.Phys. Rev. D, 107(1):016002, 2023

  6. [6]

    Searching for New Physics with Deep Autoencoders.Phys

    Marco Farina, Yuichiro Nakai, and David Shih. Searching for New Physics with Deep Autoencoders.Phys. Rev. D, 101(7):075021, 2020

  7. [7]

    Adversarially-trained autoencoders for robust unsupervised new physics searches.JHEP, 10:047, 2019

    Andrew Blance, Michael Spannowsky, and Philip Waite. Adversarially-trained autoencoders for robust unsupervised new physics searches.JHEP, 10:047, 2019

  8. [8]

    Novelty detection meets collider physics

    Jan Hajer, Ying-Ying Li, Tao Liu, and He Wang. Novelty detection meets collider physics. Phys. Rev. D, 101:076015, Apr 2020

  9. [9]

    Roy and Aravind H

    Tuhin S. Roy and Aravind H. Vijay. A robust anomaly finder based on autoencoders. 3 2019

  10. [10]

    Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey.arXiv e-prints, page arXiv:2503.13195, March 2025

    Haoqi Huang, Ping Wang, Jianhua Pei, Jiacheng Wang, Shahen Alexanian, and Dusit Niyato. Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey.arXiv e-prints, page arXiv:2503.13195, March 2025

  11. [11]

    MIT Press, 2016

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org

  12. [12]

    Autoencoders.Machine learning for data science handbook: data mining and knowledge discovery handbook, pages 353–374, 2023

    Dor Bank, Noam Koenigstein, and Raja Giryes. Autoencoders.Machine learning for data science handbook: data mining and knowledge discovery handbook, pages 353–374, 2023

  13. [13]

    B. D. Almeida et al. Characterization of cutting-edge CMOS Active Pixel sensors within the CYGNO Experiment. 12 2025

  14. [14]

    Bovik, H.R

    Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

  15. [15]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.CoRR, abs/1412.6980, 2014

  16. [16]

    G. M. Oppedisano. Trigger optimization and event classification for dark matter searches in the cygno experiment using machine learning. Master’s thesis, Sapienza University of Rome, Rome, Italy, 2025. Advisor: A. Messina; Co-advisor: S. Piacentini. 12 Machine Learning: Science and TechnologyAmaroet al Figure 6.Representative examples of events with near-...