arxiv: 2512.24290 · v2 · submitted 2025-12-30 · ⚛️ physics.ins-det · cs.LG· physics.data-an

Recognition: 2 theorem links

· Lean Theorem

Fast reconstruction-based ROI triggering via anomaly detection in the CYGNO optical TPC

F. D. Amaro , R. Antonietti , E. Baracchini , L. Benussi , C. Capoccia , M. Caponero , L. G. M. de Carvalho , G. Cavoto

show 36 more authors

I. A. Costa A. Croce M. D'Astolfo G. D'Imperio G. Dho E. Di Marco J. M. F. dos Santos D. Fiorina F. Iacoangeli Z. Islam E. Kemp H. P. Lima Jr. G. Maccarrone R. D. P. Mano D. J. G. Marques G. Mazzitelli P. Meloni A. Messina V. Monno C. M. B. Monteiro R. A. Nobrega G. M. Oppedisano I. F. Pains E. Paoletti F. Petrucci S. Piacentini D. Pierluigi D. Pinci F. Renga A. Russo G. Saviano P. A. O. C. Silva N. J. Spooner R. Tesauro S. Tomassini D. Tozzi

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:03 UTC · model grok-4.3

classification ⚛️ physics.ins-det cs.LGphysics.data-an

keywords anomaly detectionautoencoderoptical TPCROI triggeringdata reductionCYGNOunsupervised learningreal-time processing

0 comments

The pith

A pedestal-trained convolutional autoencoder detects particle signals in optical TPC images through reconstruction residuals for fast ROI triggering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an unsupervised method to extract regions of interest from large optical TPC images using a convolutional autoencoder. The autoencoder is trained only on pedestal images containing detector noise, learning to reconstruct the background without any labels or simulations. When applied to data frames, high reconstruction residuals mark the locations of particle tracks and interactions. These regions are then clustered into compact ROIs. Tests on real CYGNO prototype data show the best setup keeps 93 percent of the signal intensity while removing 97.8 percent of the image area in roughly 25 milliseconds per frame on a consumer GPU.

Core claim

The authors claim that reconstruction-based anomaly detection with a pedestal-trained convolutional autoencoder enables efficient ROI triggering in optical TPCs. By comparing residuals to identify particle-induced structures and applying thresholding and clustering, compact ROIs are generated from raw frames. Using real data, one configuration retains 93.0 +/- 0.2 percent of reconstructed signal intensity while discarding 97.8 +/- 0.1 percent of the image area, with 25 ms inference time on consumer GPU hardware. The study shows that the training objective choice is key to performance and that this approach offers a transparent, detector-agnostic baseline for online data reduction.

What carries the argument

Convolutional autoencoder trained exclusively on pedestal images, using reconstruction residuals to detect anomalies corresponding to particle events.

If this is right

Fast ROI extraction supports real-time data selection in optical TPC experiments.
Significant reduction in data volume preserves signal for rare event analysis.
Training objective design is critical for the success of reconstruction-based anomaly detection.
The method operates without requiring labeled data or detailed detector simulations.
Inference runs at approximately 25 ms per frame on consumer GPUs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could scale to larger optical TPC arrays used in dark matter searches.
Similar autoencoder methods might apply to other high-resolution imaging detectors in particle physics.
Validation against fully simulated datasets would help measure any signal loss precisely.

Load-bearing premise

That residuals from reconstructing pedestal images with the autoencoder will accurately and completely identify particle-induced features without substantial false positives or missed signals.

What would settle it

A dataset with independently verified particle events where the extracted ROIs fail to capture most of the signal intensity or retain more than a few percent of the background area.

Figures

Figures reproduced from arXiv: 2512.24290 by A. Croce, A. Messina, A. Russo, C. Capoccia, C. M. B. Monteiro, D. Fiorina, D. J. G. Marques, D. Pierluigi, D. Pinci, D. Tozzi, E. Baracchini, E. Di Marco, E. Kemp, E. Paoletti, F. D. Amaro, F. Iacoangeli, F. Petrucci, F. Renga, G. Cavoto, G. Dho, G. D'Imperio, G. Maccarrone, G. Mazzitelli, G. M. Oppedisano, G. Saviano, H. P. Lima Jr., I. A. Costa, I. F. Pains, J. M. F. dos Santos, L. Benussi, L. G. M. de Carvalho, M. Caponero, M. D'Astolfo, N. J. Spooner, P. A. O. C. Silva, P. Meloni, R. A. Nobrega, R. Antonietti, R. D. P. Mano, R. Tesauro, S. Piacentini, S. Tomassini, V. Monno, Z. Islam.

**Figure 1.** Figure 1: Schematic representation of the convolutional autoencoder architecture. The pedestal dataset used for training contains 105 frames. Despite its modest size, it is sufficient for learning the highly homogeneous noise morphology of the detector, illustrating the low calibration and data requirements of this approach. No explicit hot-pixel masking is applied: persistent camera hot pixels appear with a fixed p… view at source ↗

**Figure 2.** Figure 2: Example of synthetic perturbations injected during training of the refined autoencoder. From left to right: (a) clean pedestal frame used as reconstruction target; (b) corrupted input obtained by injecting synthetic curved strokes and Gaussian blobs with varying amplitude; (c) absolute difference between input and target (shown for visualization); (d) binary injection mask m used to up-weight the reconstru… view at source ↗

**Figure 3.** Figure 3: Representative Regions of Interest (ROIs) returned by the anomaly-detection framework. Each row shows: (a) the fiducialized camera image; (b) the anomaly map, where track-like structures appear as localized high-residual regions; (c) the final ROI mask after spatial aggregation. The ROIs reliably enclose particle-induced structures while excluding noise-dominated background regions. event sample, using an … view at source ↗

**Figure 4.** Figure 4: Trade-off between mean signal-intensity coverage and mean area cut for the three anomaly-scoring approaches, obtained by sweeping the residual threshold τ. All methods share the same ROI-extraction pipeline and are evaluated on the same event sample. Curves closer to the top-right indicate stronger compression at fixed signal retention. 4.4 Dependence on Event Energy [PITH_FULL_IMAGE:figures/full_fig_p009… view at source ↗

**Figure 5.** Figure 5: Signal-intensity coverage as a function of reconstructed event energy for the refined-training autoencoder. The method maintains high coverage across the full energy range. A small number of outliers at very low coverage are discussed separately in Section 4.5. the training objective is carefully defined. Trained exclusively on pedestal frames, the model learns the detector’s characteristic noise morpholog… view at source ↗

**Figure 6.** Figure 6: Representative examples of events with near-zero signal-intensity coverage. Each panel shows the original camera image and the corresponding ROI mask produced by the refined-training autoencoder, with pixels assigned to the event by the reference reconstruction overlaid. No clear track-like structures are visible in the raw images, indicating that these cases arise from reconstruction artifacts rather than… view at source ↗

read the original abstract

Optical-readout Time Projection Chambers (TPCs) produce megapixel-scale images whose fine-grained topological information is essential for rare-event searches, but whose size challenges real-time data selection. We present an unsupervised, reconstruction-based anomaly-detection strategy for fast Region-of-Interest (ROI) extraction that operates directly on minimally processed camera frames. A convolutional autoencoder trained exclusively on pedestal images learns the detector noise morphology without labels, simulation, or fine-grained calibration. Applied to standard data-taking frames, localized reconstruction residuals identify particle-induced structures, from which compact ROIs are extracted via thresholding and spatial clustering. Using real data from the CYGNO optical TPC prototype, we compare two pedestal-trained autoencoder configurations that differ only in their training objective, enabling a controlled study of its impact. The best configuration retains (93.0 +/- 0.2)% of reconstructed signal intensity while discarding (97.8 +/- 0.1)% of the image area, with an inference time of approximately 25 ms per frame on a consumer GPU. The results demonstrate that careful design of the training objective is critical for effective reconstruction-based anomaly detection and that pedestal-trained autoencoders provide a transparent and detector-agnostic baseline for online data reduction in optical TPCs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a practical real-data demo of pedestal-trained autoencoders for ROI triggering in optical TPCs, but the signal-retention metric carries a real risk of circularity with the downstream reconstruction.

read the letter

This paper shows a working unsupervised pipeline that trains a convolutional autoencoder solely on pedestal frames from the CYGNO optical TPC, then uses reconstruction residuals to define compact ROIs on live data. The best configuration keeps 93% of the reconstructed signal intensity while dropping 98% of the image area at roughly 25 ms per frame on a consumer GPU. That combination of numbers on actual experimental frames is the core deliverable.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an unsupervised, reconstruction-based anomaly detection strategy for fast ROI extraction in optical TPCs. A convolutional autoencoder is trained exclusively on pedestal images to learn detector noise morphology; localized residuals on standard data frames are thresholded and clustered to define compact ROIs. On real CYGNO prototype data, two training-objective variants are compared, with the best achieving (93.0 ± 0.2)% retention of reconstructed signal intensity while discarding (97.8 ± 0.1)% of the image area at ~25 ms inference per frame on a consumer GPU. The work positions the approach as a transparent, simulation-free baseline for online data reduction.

Significance. If the performance metrics are robustly validated, the method offers a practical, detector-agnostic route to real-time data reduction for megapixel-scale optical TPC images in rare-event searches. The controlled comparison of training objectives demonstrates that objective design materially affects residual quality, providing transferable guidance for anomaly-detection pipelines in instrumentation. The low-latency inference on consumer hardware is a concrete operational advantage.

major comments (2)

[Results section] Results section (performance metrics): The headline claim of (93.0 ± 0.2)% signal retention is computed from reconstructed signal intensity on the same camera frames used to generate the ROIs. The manuscript does not state whether the downstream reconstruction operates on the full frame or is restricted to the autoencoder-selected ROIs; without this clarification or an independent ground-truth measure (e.g., external coincidence or injected charge), the retention figure risks partial circular dependence on the very reconstruction the method aims to accelerate.
[Methods and validation] Methods and validation: No baseline comparisons (e.g., simple intensity thresholding or alternative unsupervised detectors) and no simulation cross-checks of the residual-to-signal mapping are reported, despite the abstract's quantitative claims on real data. The absence of these controls leaves the central performance numbers only moderately supported and makes it difficult to isolate the contribution of the autoencoder design.

minor comments (2)

The inference time is stated as 'approximately 25 ms'; reporting the exact mean, standard deviation, and hardware configuration (GPU model, batch size) would improve reproducibility.
The spatial clustering step used to form compact ROIs from thresholded residuals is described only at high level; specifying the algorithm, connectivity criterion, and minimum-size cut would aid implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and have revised the manuscript to improve clarity and add requested controls where feasible.

read point-by-point responses

Referee: [Results section] Results section (performance metrics): The headline claim of (93.0 ± 0.2)% signal retention is computed from reconstructed signal intensity on the same camera frames used to generate the ROIs. The manuscript does not state whether the downstream reconstruction operates on the full frame or is restricted to the autoencoder-selected ROIs; without this clarification or an independent ground-truth measure (e.g., external coincidence or injected charge), the retention figure risks partial circular dependence on the very reconstruction the method aims to accelerate.

Authors: We thank the referee for highlighting this potential ambiguity. The signal retention is computed by first applying the standard reconstruction algorithm to the entire camera frame to determine the total reconstructed signal intensity, then calculating the fraction of that intensity captured inside the ROIs defined from autoencoder residuals. The ROI selection depends only on reconstruction residuals and is independent of the signal-intensity measurement. We will revise the Results section to state this procedure explicitly. An independent ground-truth measure (e.g., external coincidence) is not available in the existing prototype dataset and would require new hardware and data-taking runs outside the present scope. revision: partial
Referee: [Methods and validation] Methods and validation: No baseline comparisons (e.g., simple intensity thresholding or alternative unsupervised detectors) and no simulation cross-checks of the residual-to-signal mapping are reported, despite the abstract's quantitative claims on real data. The absence of these controls leaves the central performance numbers only moderately supported and makes it difficult to isolate the contribution of the autoencoder design.

Authors: We agree that explicit baselines strengthen the claims. In the revised manuscript we will add a direct comparison with simple intensity thresholding, showing that the autoencoder yields higher signal retention at comparable area reduction. Because the method is deliberately simulation-free, we have not performed simulation cross-checks; instead, the controlled comparison of the two training objectives on real pedestal and data frames already isolates the effect of objective choice on residual quality. We will expand the Methods section to articulate this validation strategy. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical metrics measured on held-out experimental frames

full rationale

The paper trains a convolutional autoencoder exclusively on pedestal images and applies it to standard data-taking frames to extract ROIs via residuals, thresholding, and clustering. The headline performance figures (93.0 % signal retention, 97.8 % area discard) are stated as direct measurements of reconstructed signal intensity on real CYGNO prototype data, with no equations, self-citations, or definitions shown that reduce these quantities to fitted parameters or inputs defined by the same method. The derivation chain consists of an unsupervised training step followed by independent evaluation on separate frames; no load-bearing step collapses by construction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that an autoencoder trained only on pedestal images will produce residuals that correspond to particle signals; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption An autoencoder trained exclusively on pedestal images learns detector noise morphology sufficiently well to flag particle-induced structures via reconstruction residuals.
Invoked by the training strategy and the claim that residuals identify particle structures.

pith-pipeline@v0.9.0 · 5783 in / 1292 out tokens · 63809 ms · 2026-05-16T19:03:29.734303+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A convolutional autoencoder trained exclusively on pedestal images learns the detector noise morphology... localized reconstruction residuals identify particle-induced structures
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The best configuration retains (93.0 +/- 0.2)% of reconstructed signal intensity while discarding (97.8 +/- 0.1)% of the image area

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

The CYGNO Experiment.Instruments, 6(1):6, 2022

Fernando Domingues Amaro et al. The CYGNO Experiment.Instruments, 6(1):6, 2022

work page 2022
[2]

Bayesian network 3D event reconstruction in the Cygno optical TPC for dark matter direct detection.Eur

Fernando Domingues Amaro et al. Bayesian network 3D event reconstruction in the Cygno optical TPC for dark matter direct detection.Eur. Phys. J. C, 85(11):1261, 2025

work page 2025
[3]

The CYGNO experiment, a directional detector for direct Dark Matter searches.Nucl

Fernando Domingues Amaro et al. The CYGNO experiment, a directional detector for direct Dark Matter searches.Nucl. Instrum. Meth. A, 1054:168325, 2023

work page 2023
[4]

Directional idbscan to detect cosmic-ray tracks for the cygno experiment.Measurement Science and Technology, 34(12):125024, sep 2023

F D Amaro, R Antonietti, E Baracchini, L Benussi, S Bianco, F Borra, C Capoccia, M Caponero, D S Cardoso, G Cavoto, I A Costa, G D’Imperio, E Dan` e, G Dho, F Di Giambattista, E Di Marco, F Iacoangeli, E Kemp, H P Lima J´ unior, G S P Lopes, G Maccarrone, R D P Mano, R R Marcelo Gregorio, D J G Marques, G Mazzitelli, A G McLean, P Meloni, A Messina, C M B...

work page 2023
[5]

Variational autoencoders for anomalous jet tagging.Phys

Taoli Cheng, Jean-Fran¸ cois Arguin, Julien Leissner-Martin, Jacinthe Pilette, and Tobias Golling. Variational autoencoders for anomalous jet tagging.Phys. Rev. D, 107(1):016002, 2023

work page 2023
[6]

Searching for New Physics with Deep Autoencoders.Phys

Marco Farina, Yuichiro Nakai, and David Shih. Searching for New Physics with Deep Autoencoders.Phys. Rev. D, 101(7):075021, 2020

work page 2020
[7]

Adversarially-trained autoencoders for robust unsupervised new physics searches.JHEP, 10:047, 2019

Andrew Blance, Michael Spannowsky, and Philip Waite. Adversarially-trained autoencoders for robust unsupervised new physics searches.JHEP, 10:047, 2019

work page 2019
[8]

Novelty detection meets collider physics

Jan Hajer, Ying-Ying Li, Tao Liu, and He Wang. Novelty detection meets collider physics. Phys. Rev. D, 101:076015, Apr 2020

work page 2020
[9]

Roy and Aravind H

Tuhin S. Roy and Aravind H. Vijay. A robust anomaly finder based on autoencoders. 3 2019

work page 2019
[10]

Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey.arXiv e-prints, page arXiv:2503.13195, March 2025

Haoqi Huang, Ping Wang, Jianhua Pei, Jiacheng Wang, Shahen Alexanian, and Dusit Niyato. Deep Learning Advancements in Anomaly Detection: A Comprehensive Survey.arXiv e-prints, page arXiv:2503.13195, March 2025

work page arXiv 2025
[11]

MIT Press, 2016

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org

work page 2016
[12]

Autoencoders.Machine learning for data science handbook: data mining and knowledge discovery handbook, pages 353–374, 2023

Dor Bank, Noam Koenigstein, and Raja Giryes. Autoencoders.Machine learning for data science handbook: data mining and knowledge discovery handbook, pages 353–374, 2023

work page 2023
[13]

B. D. Almeida et al. Characterization of cutting-edge CMOS Active Pixel sensors within the CYGNO Experiment. 12 2025

work page 2025
[14]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004
[15]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.CoRR, abs/1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[16]

G. M. Oppedisano. Trigger optimization and event classification for dark matter searches in the cygno experiment using machine learning. Master’s thesis, Sapienza University of Rome, Rome, Italy, 2025. Advisor: A. Messina; Co-advisor: S. Piacentini. 12 Machine Learning: Science and TechnologyAmaroet al Figure 6.Representative examples of events with near-...

work page 2025