pith. sign in

arxiv: 2606.11844 · v1 · pith:VCFZAJB2new · submitted 2026-06-10 · 💻 cs.LG

TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

Pith reviewed 2026-06-27 10:39 UTC · model grok-4.3

classification 💻 cs.LG
keywords continual learninganomaly detectiontabular dataheterogeneous featurescatastrophic forgettingoutlier exposuredataset distillationdistribution alignment
0
0 comments X

The pith

Task-specific features can be mapped to a shared space to support continual anomaly detection on heterogeneous tabular data without catastrophic forgetting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method for continual anomaly detection that handles sequential arrival of data from different domains with varying feature schemas. It does so by aligning representations from each task into a common space where anomaly boundaries can be learned stably. A reader would care because many practical settings involve data streams from diverse sources, where retraining or standard continual learning fails due to input space changes. The approach combines feature alignment with data augmentation and synthetic replay to maintain performance across tasks. Evaluation on multiple datasets shows gains over fine-tuning and other baselines.

Core claim

The TaskFusion method uses an AGF model to map task-specific features into a shared space, align their distributions to reduce drift, and learn anomaly decision boundaries there. Taskfusion augmentation refines boundaries via within-task interpolation and transfers structure via cross-task mixing. Tabular dataset distillation creates compact replay samples for outlier exposure to handle imbalance and memory limits. This enables substantial improvements in continual anomaly detection over sequential fine-tuning and other baselines on 21 heterogeneous datasets, with reduced forgetting and stable performance.

What carries the argument

The AGF model that maps task-specific features into a shared space then aligns distributions and learns anomaly decision boundaries in the aligned space.

If this is right

  • Substantially improves continual anomaly detection performance over sequential fine-tuning and other CL baselines.
  • Reduces catastrophic forgetting.
  • Maintains stable detection across heterogeneous datasets.
  • Handles class imbalance and memory constraints using distilled replay samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shared space approach could extend to other machine learning tasks involving heterogeneous data streams, such as classification or regression in changing environments.
  • If the alignment preserves boundaries well, it might reduce the need for task-specific models in deployed anomaly detection systems.
  • Further work could test whether the method scales to very high-dimensional or extremely imbalanced tabular data beyond the 21 datasets evaluated.

Load-bearing premise

The AGF model can map task-specific features into a shared space and align distributions in a way that preserves anomaly decision boundaries without requiring task-specific labels or suffering from severe information loss due to heterogeneity.

What would settle it

A new experiment on additional heterogeneous tabular datasets where the method shows no improvement over baselines or exhibits significant catastrophic forgetting would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.11844 by Andreas Dengel, Dayananda Herurkar, Federico Raue, Joachim Folz, J\"orn Hees.

Figure 1
Figure 1. Figure 1: AGF architecture for heterogeneous continual anomaly detection. Task-specific tabular datasets with different feature dimensions are learned using AGF to perform anomaly prediction by knowledge accumulation without shared raw samples. After each task, the buffer is created via tabular dataset distillation to store compact synthetic samples and added to the replay memory that is used in subsequent tasks. Wi… view at source ↗
Figure 2
Figure 2. Figure 2: Influence of replay capacity measured by instances per class (IPC). Increasing IPC con￾sistently improves performance across Balanced Accuracy, PR-AUC, and ROC-AUC. Dataset distillation achieves the strongest results across all capacities, while herding is highly sensitive to memory size and random sampling shows moderate gains. dicating that replay quality is as important as replay quantity [12]. Random s… view at source ↗
Figure 3
Figure 3. Figure 3: Conceptual illustration of continual anomaly detection across heterogeneous tabular datasets. Each task corresponds to a dataset with a different feature schema and decision bound￾ary. In standard sequential learning (top), models trained independently on each task cannot effec￾tively transfer anomaly knowledge across datasets. The proposed AGF with TaskFusion frame￾work (bottom) aligns heterogeneous featu… view at source ↗
Figure 4
Figure 4. Figure 4: Taskwise performance evolution during continual learning. Each curve tracks a task as new datasets are introduced. Finetuning shows repeated drops due to catastrophic forgetting. AGF without OE partially reduces degradation, while AGF with OE and augmentation maintains nearly stable performance across tasks. A.5 Effect of Long Sequences [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance evolution over long task sequences. The plots show average performance across previously learned tasks as new datasets are introduced. Finetuning rapidly deteriorates due to catastrophic forgetting. AGF without OE degrades after several tasks, while OE and aug￾mentation stabilize learning and maintain consistently high performance across all metrics. points denote inliers and outliers respectiv… view at source ↗
Figure 6
Figure 6. Figure 6: Evolution of the learned latent G-space across the continual learning sequence. Each row corresponds to a task, while columns show the embedding of the same task after subsequent tasks are learned. Blue and red points denote inliers and outliers, respectively. The overall geometry of previously learned tasks remains stable over time, and inlier–outlier separation is preserved, indicating minimal representa… view at source ↗
Figure 7
Figure 7. Figure 7: Anomaly score stability across tasks. For each task, histograms compare anomaly score distributions at learning time (blue) and after completion of the full task sequence (orange). The strong overlap between distributions indicates that decision boundaries remain stable over con￾tinual updates, demonstrating preservation of previously learned anomaly detection behavior [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
read the original abstract

Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual learning methods ineffective due to their reliance on a fixed input space. We propose a continual learning (CL) method, which can overcome these challenges and continually learn from different tasks. Our method consists of three main parts: our AGF model, Taskfusion augmentation, and outlier exposure. The AGF-model maps task-specific features into a shared space, then aligns distributions to reduce representation drift, and learns anomaly decision boundaries in the aligned space. To improve stability, we introduce Taskfusion augmentation, combining boundary-aware interpolation within tasks to refine the model anomaly boundaries and cross-task mixing to transfer anomaly structure across datasets. To handle class imbalance and memory constraints, we employ tabular dataset distillation to store compact synthetic replay samples, which are jointly used with augmented data in an outlier exposure objective for robust anomaly detection. We evaluate the approach on 21 heterogeneous datasets across multiple domains. Results show that our approach substantially improves continual anomaly detection performance over sequential fine-tuning and other CL baselines while reducing catastrophic forgetting and maintaining stable detection across heterogeneous datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TaskFusion for continual anomaly detection on heterogeneous tabular data. The method comprises an AGF model that maps task-specific features into a shared space and performs distribution alignment to reduce drift while learning anomaly boundaries; Taskfusion augmentation that combines boundary-aware intra-task interpolation with cross-task mixing; and tabular dataset distillation to generate compact replay samples used jointly with augmented data in an outlier exposure objective. The approach is evaluated on 21 heterogeneous datasets and is claimed to substantially outperform sequential fine-tuning and other continual learning baselines while mitigating catastrophic forgetting.

Significance. If the empirical claims hold under rigorous scrutiny, the work would address a genuinely underexplored setting—continual anomaly detection under non-overlapping feature schemas and severe imbalance—where standard CL methods fail due to input-space mismatch. The combination of explicit alignment, augmentation, and distillation offers a concrete technical path that could be adopted in domains with streaming heterogeneous tabular streams.

major comments (2)
  1. [AGF model and alignment procedure] The central claim that the AGF mapping plus alignment step produces a shared representation whose anomaly decision boundaries remain faithful to each task’s original data is load-bearing, yet the manuscript provides no direct measurement (e.g., AUROC or precision-recall on held-out anomalies computed before versus after the mapping/alignment). Without this comparison it is impossible to separate the contribution of the shared-space component from the augmentation and replay mechanisms.
  2. [Abstract and experimental claims] The abstract asserts “substantial improvements … on 21 heterogeneous datasets” and “stable detection across heterogeneous datasets,” but the provided summary contains no quantitative tables, error bars, ablation results, or experimental protocol details. The data-to-claim link therefore cannot be assessed from the material supplied.
minor comments (2)
  1. Clarify whether the AGF alignment objective is supervised or unsupervised and how task-specific anomaly labels (if any) are used during alignment.
  2. Specify the exact memory budget and number of replay samples per task so that comparisons with other replay-based CL baselines are reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, providing clarifications based on the manuscript content and indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [AGF model and alignment procedure] The central claim that the AGF mapping plus alignment step produces a shared representation whose anomaly decision boundaries remain faithful to each task’s original data is load-bearing, yet the manuscript provides no direct measurement (e.g., AUROC or precision-recall on held-out anomalies computed before versus after the mapping/alignment). Without this comparison it is impossible to separate the contribution of the shared-space component from the augmentation and replay mechanisms.

    Authors: We agree that isolating the AGF mapping and alignment contribution via direct before-versus-after AUROC or precision-recall measurements on held-out anomalies would strengthen the analysis. The manuscript reports end-to-end continual anomaly detection results across 21 datasets that demonstrate gains over sequential fine-tuning and other baselines, which supports the overall pipeline including alignment. To better separate components, we will add the requested before/after comparisons in the revised manuscript. revision: yes

  2. Referee: [Abstract and experimental claims] The abstract asserts “substantial improvements … on 21 heterogeneous datasets” and “stable detection across heterogeneous datasets,” but the provided summary contains no quantitative tables, error bars, ablation results, or experimental protocol details. The data-to-claim link therefore cannot be assessed from the material supplied.

    Authors: The abstract is a concise summary of the key results. The full manuscript contains the supporting quantitative tables, error bars, ablation studies, and detailed experimental protocols for the 21 heterogeneous datasets. These establish the performance gains and stability claims. We can expand quantitative highlights in the abstract if requested. revision: partial

Circularity Check

0 steps flagged

No derivation chain or self-referential steps present

full rationale

The paper describes an empirical continual learning method (AGF model for shared-space mapping, Taskfusion augmentation, outlier exposure, and dataset distillation) evaluated on 21 heterogeneous tabular datasets. No equations, first-principles derivations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. Claims rest on experimental comparisons to baselines rather than any reduction of outputs to inputs by construction. The method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated or derivable from the provided text.

pith-pipeline@v0.9.1-grok · 5767 in / 1131 out tokens · 24555 ms · 2026-06-27T10:39:33.378261+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 13 canonical work pages

  1. [1]

    Bahri, D., Jiang, H., Tay, Y ., Metzler, D.: Scarf: Self-supervised contrastive learning using random feature corruption (2022),https://arxiv.org/abs/2106.15147

  2. [2]

    IEEE Transactions on Neural Networks and Learning Systems36(6), 10635–10647 (2025).https://doi.org/10.1109/ TNNLS.2024.3497801

    Dong, H., Frusque, G., Zhao, Y ., Chatzi, E., Fink, O.: Nng-mix: Improving semi-supervised anomaly detection with pseudo-anomaly generation. IEEE Transactions on Neural Networks and Learning Systems36(6), 10635–10647 (2025).https://doi.org/10.1109/ TNNLS.2024.3497801

  3. [3]

    In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security

    Du, M., Chen, Z., Liu, C., Oak, R., Song, D.: Lifelong anomaly detection through unlearning. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. p. 1283–1297. CCS ’19, Association for Computing Machinery, New York, NY , USA (2019).https://doi.org/10.1145/3319535.3363226,https://doi. org/10.1145/3319535.3363226

  4. [4]

    IEEE Access12, 41364– 41380 (2024).https://doi.org/10.1109/ACCESS.2024.3377690

    Faber, K., Corizzo, R., Sniezynski, B., Japkowicz, N.: Lifelong continual learning for anomaly detection: New challenges, perspectives, and insights. IEEE Access12, 41364– 41380 (2024).https://doi.org/10.1109/ACCESS.2024.3377690

  5. [5]

    In: 2022 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR)

    Fini, E., Da Costa, V .G.T., Alameda-Pineda, X., Ricci, E., Alahari, K., Mairal, J.: Self- supervised models are continual learners. In: 2022 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR). pp. 9611–9620 (2022).https://doi.org/10. 1109/CVPR52688.2022.00940

  6. [6]

    Lyons, and Robin L

    Frikha, A., Krompass, D., Tresp, V .: ARCADe: A Rapid Continual Anomaly De- tector . In: 2020 25th International Conference on Pattern Recognition (ICPR). pp. 10449–10456. IEEE Computer Society, Los Alamitos, CA, USA (Jan 2021). https://doi.org/10.1109/ICPR48806.2021.9412627,https: //doi.ieeecomputersociety.org/10.1109/ICPR48806.2021.9412627

  7. [7]

    Advances in Neural Information Processing Systems35, 32142–32159 (2022) 12 Herurkar et al

    Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y .: Adbench: Anomaly detection benchmark. Advances in Neural Information Processing Systems35, 32142–32159 (2022) 12 Herurkar et al

  8. [8]

    Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure (2019),https://arxiv.org/abs/1812.04606

  9. [9]

    IEEE Access14, 25691–25705 (2026)

    Herurkar, D., Hees, J., Tzvetkov, V ., Dengel, A.: Tabular data adapters: Pseudo-labeling un- labeled private tabular data for outlier detection. IEEE Access14, 25691–25705 (2026). https://doi.org/10.1109/ACCESS.2026.3663975

  10. [10]

    In: KI 2023: Advances in Artificial Intelligence: 46th German Conference on AI, Berlin, Germany, September 26–29, 2023, Proceedings

    Herurkar, D., Meier, M., Hees, J.: Recol: Reconstruction error columns for outlier detection. In: KI 2023: Advances in Artificial Intelligence: 46th German Conference on AI, Berlin, Germany, September 26–29, 2023, Proceedings. p. 60–74. Springer-Verlag, Berlin, Hei- delberg (2023).https://doi.org/10.1007/978-3-031-42608-7_6,https: //doi.org/10.1007/978-3-...

  11. [11]

    ArXivabs/2404.14933(2024),https://api

    Herurkar, D., Palacio, S.M., Anwar, A., Hees, J., Dengel, A.: Fin-fed-od: Federated out- lier detection on financial tabular data. ArXivabs/2404.14933(2024),https://api. semanticscholar.org/CorpusID:269303016

  12. [12]

    In: Proceedings of the 5th ACM International Conference on AI in Finance

    Herurkar, D., Raue, F., Dengel, A.: Tab-distillation: Impacts of dataset distillation on tabular data for outlier detection. In: Proceedings of the 5th ACM International Conference on AI in Finance. p. 804–812. ICAIF ’24, Association for Computing Machinery, New York, NY , USA (2024).https://doi.org/10.1145/3677052.3698660,https://doi. org/10.1145/3677052.3698660

  13. [13]

    In: International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023

    Herurkar, D., Sattarov, T., Hees, J., Palacio, S., Raue, F., Dengel, A.: Cross-domain transfor- mation for outlier detection on tabular datasets. In: International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023. pp. 1–8. IEEE (2023). https://doi.org/10.1109/IJCNN54540.2023.10191326,https://doi. org/10.1109/IJCNN54...

  14. [14]

    Expert Systems with Applications193, 116429 (2022).https: //doi.org/https://doi.org/10.1016/j.eswa.2021.116429,https:// www.sciencedirect.com/science/article/pii/S0957417421017164

    Hilal, W., Gadsden, S.A., Yawney, J.: Financial fraud: A review of anomaly detection tech- niques and recent advances. Expert Systems with Applications193, 116429 (2022).https: //doi.org/https://doi.org/10.1016/j.eswa.2021.116429,https:// www.sciencedirect.com/science/article/pii/S0957417421017164

  15. [15]

    King, S., Zhang, Z., Yu, R., Coskun, B., Ding, W., Cui, Q.: Contextual learning for anomaly detection in tabular data (2025),https://arxiv.org/abs/2509.09030

  16. [16]

    Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

    Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Mi- lan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Ku- maran, D., Hadsell, R.: Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017).https://doi.org/ 10.1073/pnas...

  17. [17]

    Li, Z., Hoiem, D.: Learning without forgetting (2017),https://arxiv.org/abs/ 1606.09282

  18. [18]

    In: 2024 IEEE 40th International Conference on Data Engi- neering (ICDE)

    LIU, H., DI, S., LI, H., LI, S., CHEN, L., ZHOU, X.: Effective data selection and replay for unsupervised continual learning. In: 2024 IEEE 40th International Conference on Data Engi- neering (ICDE). pp. 1449–1463 (2024).https://doi.org/10.1109/ICDE60146. 2024.00119

  19. [19]

    Rebuffi and A

    Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: icarl: Incremental classifier and rep- resentation learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5533–5542 (2017).https://doi.org/10.1109/CVPR.2017.587

  20. [20]

    In: Proceedings of the 33rd ACM International Conference on Infor- mation and Knowledge Management

    Thimonier, H., Popineau, F., Rimmel, A., Doan, B.L.: Retrieval augmented deep anomaly de- tection for tabular data. In: Proceedings of the 33rd ACM International Conference on Infor- mation and Knowledge Management. p. 2250–2259. CIKM ’24, Association for Computing Machinery, New York, NY , USA (2024).https://doi.org/10.1145/3627673. 3679559,https://doi.o...

  21. [21]

    2022 International Joint Conference on Neu- ral Networks (IJCNN) pp

    Thimonier, H., Popineau, F., Rimmel, A., Doan, B.L., Daniel, F.: Tracinad: Mea- suring influence for anomaly detection. 2022 International Joint Conference on Neu- ral Networks (IJCNN) pp. 1–6 (2022),https://api.semanticscholar.org/ CorpusID:248505883

  22. [22]

    In: International Conference on Machine Learning (2018),https://api.semanticscholar.org/ CorpusID:59604501

    Verma, V ., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., Bengio, Y .: Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning (2018),https://api.semanticscholar.org/ CorpusID:59604501

  23. [23]

    IEEE Transactions on Knowledge and Data Engineering35(12), 12591–12604 (2023).https: //doi.org/10.1109/TKDE.2023.3270293

    Xu, H., Pang, G., Wang, Y ., Wang, Y .: Deep isolation forest for anomaly detection. IEEE Transactions on Knowledge and Data Engineering35(12), 12591–12604 (2023).https: //doi.org/10.1109/TKDE.2023.3270293

  24. [24]

    In: The Twelfth International Conference on Learning Representa- tions (2024),https://openreview.net/forum?id=lNZJyEDxy4

    Yin, J., Qiao, Y ., Zhou, Z., Wang, X., Yang, J.: MCM: Masked cell modeling for anomaly detection in tabular data. In: The Twelfth International Conference on Learning Representa- tions (2024),https://openreview.net/forum?id=lNZJyEDxy4

  25. [25]

    In: 6th International Conference on Learning Representations, ICLR 2018, Van- couver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings

    Zhang, H., Cissé, M., Dauphin, Y .N., Lopez-Paz, D.: mixup: Beyond empirical risk mini- mization. In: 6th International Conference on Learning Representations, ICLR 2018, Van- couver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenRe- view.net (2018),https://openreview.net/forum?id=r1Ddp1-Rb

  26. [26]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Zhao, B., Bilen, H.: Dataset condensation with distribution matching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 6514–6523 (2023) A Appendix: To further understand the behavior of the proposed framework under long continual learning sequences, we provide additional qualitative and distributional analyses of t...