Graph-Based Light-Curve Features for Robust Transient Classification

David J. Ruiz-Morales; D. Sierra-Porta; Jes\'us D. Petro-Ramos

arxiv: 2510.17721 · v2 · submitted 2025-10-20 · 🌌 astro-ph.IM

Graph-Based Light-Curve Features for Robust Transient Classification

Jes\'us D. Petro-Ramos , David J. Ruiz-Morales , D. Sierra-Porta This is my paper

Pith reviewed 2026-05-18 05:53 UTC · model grok-4.3

classification 🌌 astro-ph.IM

keywords visibility graphslight curvestransient classificationgraph featuresastronomical time seriesmachine learningMANTRA benchmark

0 comments

The pith

Visibility graphs turn light curves into features that let standard classifiers identify astronomical transients at macro-F1 0.622.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper converts irregular astronomical light curves into three visibility-graph representations: horizontal, directed, and weighted. From each graph it extracts a compact set of network statistics such as degree and strength moments, clustering coefficients, motifs, assortativity, path measures, and spectral summaries. These descriptors feed into tree-based models including LightGBM on a quality-controlled, class-balanced subset of 1705 objects drawn from the MANTRA benchmark. The best combination reaches macro-F1 of 0.622 and accuracy of 0.661, indicating that graph topology can serve as a survey-agnostic bridge from raw photometry to multiclass prediction without custom deep networks.

Core claim

Mapping each light curve to horizontal visibility graphs, directed horizontal visibility graphs, and weighted horizontal visibility graphs, then extracting degree/strength moments, clustering and motifs, assortativity, path/efficiency, and spectral summaries, allows LightGBM to attain a macro-F1 of 0.622 plus or minus 0.010 and accuracy of 0.661 plus or minus 0.010 on the filtered MANTRA subset, with the directed and weighted views supplying complementary information beyond undirected topology.

What carries the argument

The three visibility-graph views (HVG, DHVG, W-HVG) of photometric time series, from which length-aware network descriptors are extracted to form input features for tree-based classifiers.

If this is right

Weighted contrasts and directed asymmetry add complementary gains beyond undirected topology.
Strong separation occurs for CV, HPM, and Non-Tr. classes while residual confusions concentrate in the AGN-Blazar-SN block.
The method yields competitive multiclass results on quality-controlled data without requiring bespoke deep architectures.
The approach works on a minimum-coverage subset of at least 100 epochs per object.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph descriptors could be tested on light curves from other surveys to check survey-agnostic behavior.
Combining these features with existing photometric statistics might further reduce confusions in the AGN-Blazar-SN group.
Releasing object IDs and code makes it straightforward to verify the numbers or extend the feature set on new transient catalogs.

Load-bearing premise

The chosen visibility-graph descriptors retain enough discriminative information from the original irregular photometric series to support multiclass separation after quality filtering.

What would settle it

Re-running the identical LightGBM pipeline on the released list of 1705 object IDs and obtaining a macro-F1 below 0.60 would directly test whether the reported performance holds.

read the original abstract

We investigate graph-based representations of astronomical light curves for transient classification on a quality-controlled, class-balanced subset of the MANTRA benchmark (minimum coverage N_min=100 epochs; N=1705 objects after filtering and Non-Tr. subsampling). Each series is mapped to three visibility-graph views -- horizontal (HVG), directed (DHVG), and weighted (W-HVG) -- from which we extract compact, length-aware network descriptors (degree/strength moments, clustering and motifs, assortativity, path/efficiency, and spectral summaries). Using object-level stratified five-fold validation and tree-based learners, the best configuration (LightGBM with HVG+DHVG+W-HVG features) attains a macro-F1 of 0.622 +/- 0.010 and accuracy of 0.661 +/- 0.010 on this subset. For context, the published MANTRA baseline reports F1_macro=0.528 on the full dataset; because class priors differ after quality control, this reference is not a like-for-like comparison. Ablations show that weighted contrasts and directed asymmetry contribute complementary gains to undirected topology. Per-class analysis highlights strong performance for CV, HPM, and Non-Tr., with residual confusions concentrated in the AGN-Blazar-SN block. These results indicate that visibility graphs offer a simple, survey-agnostic bridge between irregular photometric time series and standard classifiers, yielding competitive multiclass performance without bespoke deep architectures. We release code and feature definitions, together with the list of object IDs used in the evaluation subset, to facilitate reproducibility and future extensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Visibility graphs give a practical feature set for light-curve classification on filtered data, but the gains over standard methods are not isolated by a matched baseline.

read the letter

This paper shows that visibility graphs can provide useful features for classifying astronomical transients on a quality-filtered dataset, reaching a macro-F1 of about 0.62 with LightGBM. The approach is practical but the advantage over standard features isn't fully demonstrated yet. What is actually new is the specific setup with three visibility-graph views—horizontal, directed, and weighted—plus the collection of network descriptors like degree moments, clustering coefficients, motifs, assortativity, and spectral summaries, all applied to transient light curves on the MANTRA benchmark. While graph methods for time series are not brand new, this combination tailored to photometric transients and tested with object-level cross-validation has not been reported before in the references they cite. The paper does well on the experimental side by using stratified five-fold validation, reporting error bars, and including ablations that separate the contributions of the different graph views. Releasing the code, the exact feature definitions, and the list of 1705 object IDs used is a real help for reproducibility. The soft spot is the baseline. They compare against the published MANTRA F1 of 0.528, but note that their subset has different class distributions after filtering for N_min=100 and subsampling. No experiment runs a conventional feature set, such as statistical summaries or Lomb-Scargle features, on the identical filtered objects and folds. This leaves open whether the graph descriptors are adding unique value or if the cleaner, longer light curves simply make classification easier for any reasonable feature vector. This work is for people in astro-informatics or survey data analysis who are looking for feature-engineering options for transient classification without jumping to deep learning models. A reader focused on practical tools for handling irregular time series in large catalogs would get concrete value from the details and the open resources. It deserves a serious referee because the method is clearly described, the results are presented with appropriate controls for what they did, and the reproducibility measures make it straightforward to evaluate. I would recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates graph-based representations of astronomical light curves for transient classification. Each light curve is converted to horizontal (HVG), directed (DHVG), and weighted (W-HVG) visibility graphs, from which compact network descriptors (degree/strength moments, clustering, motifs, assortativity, path measures, spectral summaries) are extracted. On a post-filtered, class-balanced subset of the MANTRA benchmark (N_min=100 epochs, N=1705 objects after Non-Tr. subsampling), LightGBM with the combined HVG+DHVG+W-HVG feature set attains macro-F1 of 0.622 ± 0.010 and accuracy of 0.661 ± 0.010 under object-level stratified five-fold cross-validation. Ablations indicate complementary gains from weighted and directed views; per-class results are strong for CV, HPM, and Non-Tr. but show residual confusions in the AGN-Blazar-SN group. Code, feature definitions, and the object-ID list are released.

Significance. If the discriminative power of the visibility-graph descriptors holds under proper controls, the work supplies a simple, survey-agnostic, and computationally lightweight feature pipeline that maps irregular photometric series directly to off-the-shelf classifiers without bespoke deep architectures. Explicit strengths include the release of code and the exact evaluation list, the use of stratified five-fold validation with error bars, and ablations that isolate contributions from directed asymmetry and weighted contrasts. These elements support reproducibility and incremental extension by the community.

major comments (2)

Abstract and Results section: The headline result (LightGBM + HVG+DHVG+W-HVG, macro-F1 0.622 ± 0.010) is offered as competitive with the published MANTRA F1_macro=0.528. The manuscript correctly notes that class priors differ after N_min=100 filtering and Non-Tr. subsampling, rendering the numbers non-comparable. However, no control experiment applies any conventional feature set (statistical moments, Lomb-Scargle summaries, etc.) to the identical 1705-object list under the same learner and same object-level 5-fold CV. Without this baseline, the specific contribution of the visibility-graph descriptors cannot be isolated from the effects of quality filtering and balancing. This is load-bearing for the central claim that the graph features provide a competitive bridge to multiclass classification.
Methods and Ablation sections: The claim that the chosen descriptors (degree moments, clustering, motifs, assortativity, path/efficiency, spectral summaries) retain sufficient discriminative information rests on performance after aggressive filtering to high-coverage objects. No direct test of information retention—such as a comparison of classification performance using the raw light-curve statistics versus the graph-derived features on the same objects—is reported. This leaves open whether simpler descriptors would achieve similar scores on this particular subset.

minor comments (2)

Abstract: The statement that code and the object-ID list are released would benefit from an explicit repository URL or DOI to improve immediate accessibility for readers.
Per-class analysis: The residual confusions concentrated in the AGN-Blazar-SN block are noted qualitatively; inclusion of a normalized confusion matrix or pairwise F1 scores would allow quantitative assessment of the severity of these confusions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments highlight important points regarding the isolation of our graph-based features' contribution, and we address each below with plans for revision.

read point-by-point responses

Referee: Abstract and Results section: The headline result (LightGBM + HVG+DHVG+W-HVG, macro-F1 0.622 ± 0.010) is offered as competitive with the published MANTRA F1_macro=0.528. The manuscript correctly notes that class priors differ after N_min=100 filtering and Non-Tr. subsampling, rendering the numbers non-comparable. However, no control experiment applies any conventional feature set (statistical moments, Lomb-Scargle summaries, etc.) to the identical 1705-object list under the same learner and same object-level 5-fold CV. Without this baseline, the specific contribution of the visibility-graph descriptors cannot be isolated from the effects of quality filtering and balancing. This is load-bearing for the central claim that the graph features provide a competitive bridge to multiclass classification.

Authors: We agree that a direct baseline using conventional features on the identical 1705-object filtered subset under the same LightGBM learner and object-level stratified 5-fold CV is necessary to isolate the specific contribution of the visibility-graph descriptors from the effects of quality filtering and class balancing. In the revised manuscript we will add this control experiment, extracting standard statistical moments and Lomb-Scargle summaries from the same light curves and reporting the resulting macro-F1 and accuracy for comparison. revision: yes
Referee: Methods and Ablation sections: The claim that the chosen descriptors (degree moments, clustering, motifs, assortativity, path/efficiency, spectral summaries) retain sufficient discriminative information rests on performance after aggressive filtering to high-coverage objects. No direct test of information retention—such as a comparison of classification performance using the raw light-curve statistics versus the graph-derived features on the same objects—is reported. This leaves open whether simpler descriptors would achieve similar scores on this particular subset.

Authors: We concur that a head-to-head comparison of raw light-curve statistics against the graph-derived features on the exact same 1705 high-coverage objects would provide clearer evidence that the network descriptors retain discriminative information beyond simpler summaries. We will include this analysis in the revised manuscript by computing basic statistical features directly from the raw series of these objects and evaluating them under the identical cross-validation protocol. revision: yes

Circularity Check

0 steps flagged

Standard feature extraction plus supervised classification; no derivations reduce to inputs by construction

full rationale

The paper applies visibility-graph mappings (HVG, DHVG, W-HVG) to light curves, extracts standard network statistics (degree moments, clustering, motifs, etc.), and feeds the resulting feature vectors into off-the-shelf classifiers (LightGBM, etc.) under object-level 5-fold CV. Reported macro-F1 and accuracy are direct empirical outcomes of this pipeline on the filtered 1705-object subset. No equations, ansatzes, or self-citations are invoked to derive or force the performance numbers; the workflow is self-contained against the external MANTRA benchmark (with explicit caveats on subset differences). No self-definitional, fitted-input-called-prediction, or load-bearing self-citation patterns appear.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that visibility graphs preserve classification-relevant structure in irregularly sampled light curves and on standard supervised-learning assumptions about feature independence and cross-validation validity.

free parameters (1)

N_min
Threshold of 100 epochs used to define the quality-controlled subset; directly affects which objects enter the evaluation.

axioms (1)

domain assumption Visibility graphs constructed from light curves retain sufficient temporal and amplitude information for multiclass discrimination.
Invoked when mapping photometric series to HVG/DHVG/W-HVG and extracting descriptors for classification.

pith-pipeline@v0.9.0 · 5824 in / 1288 out tokens · 38699 ms · 2026-05-18T05:53:03.472249+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each series is mapped to three visibility-graph views—horizontal (HVG), directed (DHVG), and weighted (W-HVG)—from which we extract compact, length-aware network descriptors (degree/strength moments, clustering and motifs, assortativity, path/efficiency, and spectral summaries).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.