Fun-TSG: A Function-Driven Multivariate Time Series Generator with Variable-Level Anomaly Labeling

Andr\'e P\'eninou (UT2J; Comue de Toulouse); IRIT; IRIT); IRIT-SIG; Olivier Teste (IRIT-SIG; Pierre Lotte (EPE UT; UT2J

arxiv: 2604.14221 · v1 · submitted 2026-04-14 · 💻 cs.AI

Fun-TSG: A Function-Driven Multivariate Time Series Generator with Variable-Level Anomaly Labeling

Pierre Lotte (EPE UT , IRIT) , Andr\'e P\'eninou (UT2J , IRIT-SIG , Olivier Teste (IRIT-SIG , IRIT , UT2J , Comue de Toulouse) This is my paper

Pith reviewed 2026-05-10 15:51 UTC · model grok-4.3

classification 💻 cs.AI

keywords multivariate time seriesanomaly detectionsynthetic data generationbenchmarkingvariable-level labelingfunction-driven modelingevaluation framework

0 comments

The pith

Fun-TSG generates multivariate time series with explicit dependencies and variable-level anomaly labels for precise detector evaluation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fun-TSG to overcome shortcomings in existing benchmarks for anomaly detection in multivariate time series. Those benchmarks often miss fine-grained labels, clear dependency structures, and details on how the data was produced. Fun-TSG allows both random sampling of dependencies and anomalies as well as user-specified equations, while supplying ground-truth labels at the level of individual variables and timestamps. This setup lets researchers build controlled, reproducible test cases that support detailed analysis of how models perform on specific anomaly types and variables. A reader would care because better benchmarks could accelerate reliable progress in detecting anomalies across fields that rely on time series monitoring.

Core claim

Fun-TSG is a fully customizable time series generator that supports automated generation based on randomly sampled dependency structures and anomaly types, as well as manual generation through user-defined equations and anomaly configurations. In both modes it maintains full transparency over the generative process and supplies ground-truth anomaly labels at the variable and timestamp levels, enabling diverse, interpretable, and reproducible benchmarking scenarios for both classical and modern anomaly detection models.

What carries the argument

Fun-TSG, a function-driven generator that models inter-variable and temporal dependencies through equations while injecting controllable anomalies with variable-specific and timestamp-specific labels.

If this is right

Researchers gain the ability to create fully reproducible test scenarios that include known ground truth for comparing model performance.
Fine-grained analysis becomes possible, showing how models behave on particular variables or specific anomaly types.
The dual automated and manual modes allow tailoring of benchmark difficulty and structure to match different evaluation goals.
Transparency into the generative equations supports interpretability studies of detection models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the generated series capture key statistical properties of real data, the tool could reduce dependence on scarce labeled real-world datasets for initial model development.
The equation-based approach might be extended to test how well detectors handle specific dependency patterns that are hard to observe in practice.
Custom labeling at the variable level could help diagnose whether models are truly localizing anomalies or merely reacting to overall signal changes.

Load-bearing premise

The data produced by random or user-defined functions and anomalies will be realistic enough to yield performance insights that transfer to real-world anomaly detection tasks.

What would settle it

An experiment in which the relative performance ranking of several anomaly detection models on Fun-TSG data differs substantially from their ranking on established real-world multivariate time series datasets would indicate the generator fails to produce representative test cases.

Figures

Figures reproduced from arXiv: 2604.14221 by Andr\'e P\'eninou (UT2J, Comue de Toulouse), IRIT, IRIT), IRIT-SIG, Olivier Teste (IRIT-SIG, Pierre Lotte (EPE UT, UT2J.

read the original abstract

Reliable evaluation of anomaly detection methods in multivariate time series remains an open challenge, largely due to the limitations of existing benchmark datasets. Current resources often lack fine-grained anomaly annotations, do not provide explicit intervariable and temporal dependencies, and offer little insight into the underlying generative mechanisms. These shortcomings hinder the development and rigorous comparison of detection models, especially those targeting interpretable and variable-specific outputs. To address this gap, we introduce Fun-TSG, a fully customizable time series generator designed to support high-quality evaluation of anomaly detection systems. Our tool enables both fully automated generation, based on randomly sampled dependency structures and anomaly types, and manual generation through user-defined equations and anomaly configurations. In both cases, it provides full transparency over the data generation process, including access to ground-truth anomaly labels at the variable and timestamp levels. Fun-TSG supports the creation of diverse, interpretable, and reproducible benchmarking scenarios, enabling fine-grained performance analysis for both classical and modern anomaly detection models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fun-TSG offers a customizable generator for multivariate time series with variable-level anomaly labels, but the paper gives no evidence that the outputs are realistic enough to support transferable benchmarking.

read the letter

Fun-TSG is a new tool for generating multivariate time series data that includes precise labels for anomalies at both the variable and timestamp levels. It can run in automated mode by sampling dependencies and anomaly types randomly or in manual mode where users define equations and configurations. The transparency over the generation process is a plus for reproducibility. What the paper does well is outline a system that fills some gaps in existing benchmarks, like the lack of fine-grained annotations and insight into generative mechanisms. The dual modes give flexibility, which could help in creating varied test cases for anomaly detection research. The soft spots are clear though. The description stays at the level of intended capabilities without showing any generated examples, statistical comparisons to real data, or experiments with actual anomaly detectors. There's no evidence presented that the outputs are realistic enough that performance on them would transfer to real-world scenarios. This makes the claim of supporting high-quality evaluation hard to assess right now. It would help to see things like fidelity metrics or transfer tests, but those are missing. The paper is mainly for researchers developing or evaluating anomaly detection methods in time series who want more control over their test data. A reader who needs synthetic benchmarks with known ground truth would find the concept relevant, especially if they are working on interpretable models. I would recommend sending it for peer review. The idea has merit as a contribution to evaluation infrastructure, and referees could push for the necessary validation experiments to strengthen it. Even without those yet, the work is coherent and worth considering for publication after revisions.

Referee Report

2 major / 2 minor

Summary. The paper introduces Fun-TSG, a fully customizable multivariate time series generator supporting automated generation via randomly sampled dependency structures and anomaly types, as well as manual generation through user-defined equations and anomaly configurations, with full transparency and ground-truth anomaly labels at variable and timestamp levels to enable high-quality evaluation of anomaly detection systems.

Significance. If the generated data can be shown to be realistic and representative, the tool could meaningfully address gaps in existing benchmarks by providing controllable, reproducible scenarios with explicit dependencies and fine-grained labels, facilitating more rigorous comparisons of classical and modern anomaly detection models, especially those emphasizing interpretability and variable-specific outputs.

major comments (2)

[Abstract] Abstract: the central claim that Fun-TSG enables 'high-quality evaluation' of anomaly detection systems rests on the assumption that its generated series are sufficiently realistic and representative, yet the manuscript contains no validation experiments, fidelity metrics (e.g., statistical property comparisons or transfer performance tests), example outputs, or comparisons to existing generators.
[The manuscript] The manuscript: no section demonstrates that randomly sampled dependencies or user equations produce data whose dependence structures, anomaly semantics, or statistical properties transfer to real-world multivariate time series, which is load-bearing for the utility claim.

minor comments (2)

The description of the automated and manual modes would benefit from pseudocode or explicit algorithmic steps to improve reproducibility.
Consider clarifying the exact functional forms used for dependency modeling and anomaly injection in the manual mode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need to substantiate the utility of generated data. We address each major comment below and outline planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that Fun-TSG enables 'high-quality evaluation' of anomaly detection systems rests on the assumption that its generated series are sufficiently realistic and representative, yet the manuscript contains no validation experiments, fidelity metrics (e.g., statistical property comparisons or transfer performance tests), example outputs, or comparisons to existing generators.

Authors: We agree that the current manuscript lacks explicit validation experiments, fidelity metrics, example outputs, and direct comparisons to existing generators. The core contribution is a transparent generator that supplies controllable dependencies, anomaly types, and fine-grained ground-truth labels at variable and timestamp levels, enabling reproducible evaluation scenarios that are difficult to obtain from real data. We do not claim statistical equivalence to any specific real-world dataset. In the revision we will add a new subsection with example outputs, basic statistical summaries of generated series (e.g., correlation structures and anomaly injection effects), and qualitative comparisons to representative existing generators. revision: yes
Referee: [The manuscript] The manuscript: no section demonstrates that randomly sampled dependencies or user equations produce data whose dependence structures, anomaly semantics, or statistical properties transfer to real-world multivariate time series, which is load-bearing for the utility claim.

Authors: The manuscript emphasizes controllability and transparency rather than automatic statistical transfer to real data. Random sampling and user-defined equations are intended to let practitioners construct known, interpretable scenarios for rigorous testing of detection models, not to replicate any particular real-world distribution. We acknowledge that no section currently demonstrates transfer performance. We will add illustrative examples showing how common real-world patterns (seasonality, lagged dependencies, point and contextual anomalies) can be expressed via the provided mechanisms, together with guidance on how users may validate their own generated data against target domains. A comprehensive transfer study lies outside the scope of this tool-description paper but can be noted as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: tool description with no derivation chain or self-referential equations

full rationale

The paper introduces Fun-TSG as a software generator for multivariate time series with anomaly labels. It describes automated random sampling of dependencies/anomalies and manual user-defined equations, plus transparency features. No mathematical derivation, prediction step, or fitted parameter is presented that could reduce to its own inputs. No self-citation load-bearing claims, uniqueness theorems, or ansatz smuggling appear in the provided text. The central contribution is a customizable tool rather than a closed-form result or benchmark claim that loops back on itself. Absence of any load-bearing equation or theorem means the circularity patterns do not apply.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The contribution rests on the domain assumption that synthetic time series generated from functions or random structures can serve as valid proxies for real data evaluation, with no free parameters fitted inside the paper and no new physical or mathematical entities postulated.

axioms (1)

domain assumption Multivariate time series can be meaningfully generated from user-specified equations and randomly sampled dependency structures
Invoked in the description of both automated and manual generation modes.

pith-pipeline@v0.9.0 · 5505 in / 1269 out tokens · 45538 ms · 2026-05-10T15:51:18.375810+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

1 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Toulouse, France ABSTRACT Reliable evaluation of anomaly detection methods in multivariate time series remains an open challenge, largely due to the limita- tions of existing benchmark datasets. Current resources often lack fine-grained anomaly annotations, do not provide explicit inter- variable and temporal dependencies, and offer little insight into th...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[1] [1]

Toulouse, France ABSTRACT Reliable evaluation of anomaly detection methods in multivariate time series remains an open challenge, largely due to the limita- tions of existing benchmark datasets. Current resources often lack fine-grained anomaly annotations, do not provide explicit inter- variable and temporal dependencies, and offer little insight into th...

work page internal anchor Pith review Pith/arXiv arXiv 2026