pith. sign in

arxiv: 2606.01221 · v1 · pith:YCOLMHIOnew · submitted 2026-05-31 · 💻 cs.LG · cs.AI

Hybrid Imbalanced Regression Through Unified Data-Level and Algorithm-Level Balancing

Pith reviewed 2026-06-28 17:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords imbalanced regressionhybrid balancingdata-level balancingalgorithm-level balancingconditional variational autoencoderlatent-density weighted lossadaptive binningbenchmark datasets
0
0 comments X

The pith

A five-stage hybrid pipeline unifies data-level oversampling and algorithm-level weighted loss to improve regression on rare target values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that combining data balancing and algorithm adjustments in one regressor-agnostic pipeline overcomes the noise problems of pure data methods and the complexity-handling limits of pure algorithm methods. This matters for tasks where models must predict infrequent but critical target values without bias from common cases. The approach segments targets adaptively, learns representations with a conditional variational autoencoder, oversamples minority clusters, applies a latent-density weighted loss, and fuses outputs via attention gating. Experimental results on benchmarks support consistent gains over standalone regressors and prior imbalanced regression techniques.

Core claim

The central claim is that a regressor-agnostic five-stage framework integrating adaptive bin partitioning based on local linear coherence, target-conditioned representation learning with a conditional variational autoencoder, multistage feature-space clustering and oversampling, a novel latent-density weighted loss to emphasize rare samples, and attention-based gated fusion consistently improves predictive performance compared to standalone regressors and existing imbalanced regression approaches on benchmark datasets.

What carries the argument

The five-stage hybrid balancing pipeline that performs adaptive bin partitioning, conditional variational autoencoder representation learning, clustering-based oversampling, latent-density weighted loss, and gated fusion.

Load-bearing premise

The specific five-stage pipeline of adaptive binning, conditional variational autoencoder, clustering oversampling, latent-density weighted loss, and gated fusion avoids introducing noise or overfitting while handling complex target distributions.

What would settle it

Results on a benchmark dataset with highly complex multimodal target distributions where the hybrid pipeline shows no performance gain or degrades accuracy relative to existing methods.

Figures

Figures reproduced from arXiv: 2606.01221 by Hossein Mohammadi, Mohsen Afsharchi, Shermin Shahbazi.

Figure 1
Figure 1. Figure 1: The general architecture of the proposed imbalanced learning system [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of proposed adaptive bin partitioning method on applied datasets. The histogram and kernel [PITH_FULL_IMAGE:figures/full_fig_p029_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: (continued) [PITH_FULL_IMAGE:figures/full_fig_p030_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of isolated data-level and algorithm-level balancing methods versus the hybrid pipeline (the [PITH_FULL_IMAGE:figures/full_fig_p032_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Impact of target variable skewness on hybrid pipeline performance across applied datasets. The points and trend [PITH_FULL_IMAGE:figures/full_fig_p036_4.png] view at source ↗
read the original abstract

Imbalanced learning is a critical challenge in machine learning, where underrepresented target values can bias models and degrade prediction performance on rare but important cases. Although extensively studied in classification, imbalanced regression remains relatively underexplored. Existing methods mainly focus on either data-level balancing, which may introduce noise and overfitting, or algorithm-level balancing, which often struggles with highly complex target distributions. To address these limitations, we propose a unified hybrid framework that integrates both data- and algorithm-level balancing strategies into a regressor-agnostic pipeline. The proposed framework consists of five stages: (1) adaptive bin partitioning to dynamically segment the target space based on local linear coherence; (2) target-conditioned representation learning using a Conditional Variational Autoencoder; (3) multistage data-level balancing through feature-space clustering and oversampling of minority clusters; (4) algorithm-level balancing using a novel Latent-Density Weighted Loss (LDWL) to emphasize rare samples in latent and target spaces; and (5) attention-based gated fusion for final regression. Experimental results on benchmark datasets demonstrate that the proposed framework consistently improves predictive performance compared to standalone regressors and existing imbalanced regression approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a hybrid framework for imbalanced regression that unifies data-level and algorithm-level balancing in a five-stage pipeline: (1) adaptive bin partitioning based on local linear coherence, (2) target-conditioned representation learning via Conditional Variational Autoencoder, (3) multistage data-level balancing via feature-space clustering and oversampling of minority clusters, (4) algorithm-level balancing with a novel Latent-Density Weighted Loss (LDWL), and (5) attention-based gated fusion. It claims that this regressor-agnostic approach yields consistent predictive improvements on benchmark datasets relative to standalone regressors and prior imbalanced regression methods.

Significance. If the empirical claims hold under rigorous controls, the work could provide a practical engineering template for handling complex target distributions in regression by mitigating the respective weaknesses of pure data-level (noise/overfitting) and algorithm-level (struggle with complexity) strategies.

major comments (2)
  1. [Experimental Results] Experimental Results (and associated tables/figures): The central claim of 'consistent improvements' is asserted without reported dataset names or characteristics, baseline implementations, hyperparameter settings, statistical tests (e.g., paired t-tests or Wilcoxon), ablation results isolating each stage, or error bars across multiple runs. These omissions make it impossible to evaluate whether the five-stage pipeline actually avoids the noise/overfitting risks highlighted in the introduction.
  2. [§4] §4 (LDWL definition): The Latent-Density Weighted Loss is introduced as a novel algorithm-level component, yet its weighting coefficients appear among the free parameters listed in the axiom ledger; without an explicit derivation showing that the loss is not simply a reparameterized form of existing density-weighted losses, the novelty and independence from the adaptive partitioning thresholds cannot be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight areas where additional rigor will strengthen the manuscript. We address each major comment below and have prepared revisions to incorporate the requested details and clarifications.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental Results (and associated tables/figures): The central claim of 'consistent improvements' is asserted without reported dataset names or characteristics, baseline implementations, hyperparameter settings, statistical tests (e.g., paired t-tests or Wilcoxon), ablation results isolating each stage, or error bars across multiple runs. These omissions make it impossible to evaluate whether the five-stage pipeline actually avoids the noise/overfitting risks highlighted in the introduction.

    Authors: We agree that the experimental section requires these details for rigorous evaluation. In the revised manuscript we will: (i) explicitly name and characterize all benchmark datasets (including target distribution statistics), (ii) document baseline implementations and hyperparameter selection procedures, (iii) report results accompanied by statistical significance tests (paired t-tests and Wilcoxon signed-rank), (iv) present ablation studies that isolate the contribution of each of the five stages, and (v) include error bars computed over multiple independent runs. These additions will directly address concerns about noise/overfitting and allow readers to assess the pipeline's robustness. revision: yes

  2. Referee: [§4] §4 (LDWL definition): The Latent-Density Weighted Loss is introduced as a novel algorithm-level component, yet its weighting coefficients appear among the free parameters listed in the axiom ledger; without an explicit derivation showing that the loss is not simply a reparameterized form of existing density-weighted losses, the novelty and independence from the adaptive partitioning thresholds cannot be assessed.

    Authors: We will expand §4 with a formal derivation of LDWL that starts from the latent density estimates produced by the CVAE and shows how the weighting function differs from standard density-weighted losses (e.g., those based solely on target-space density). The derivation will also demonstrate that the coefficients are not free parameters but are deterministically obtained from the latent representations, and we will explicitly discuss their relationship to the adaptive bin-partitioning thresholds to establish independence. This material will be added as a new subsection with supporting equations. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical five-stage pipeline (adaptive bin partitioning, CVAE, clustering oversampling, LDWL loss, gated fusion) for imbalanced regression and supports its claims solely through benchmark experiments showing performance gains. No mathematical derivation chain, self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The LDWL and partitioning steps are presented as engineering choices whose validity rests on external validation metrics rather than reducing to the inputs by construction. The central claim remains falsifiable via the reported comparisons and does not rely on any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Ledger is inferred solely from the abstract's description of the five stages; full parameter lists, derivation steps, and validation of assumptions are unavailable.

free parameters (2)
  • adaptive bin partitioning thresholds
    Parameters controlling dynamic segmentation of target space by local linear coherence.
  • LDWL weighting coefficients
    Weights used to emphasize rare samples in latent and target spaces.
axioms (1)
  • domain assumption Local linear coherence provides a reliable basis for segmenting the target space without introducing bias
    Invoked to justify the first stage of adaptive bin partitioning.
invented entities (1)
  • Latent-Density Weighted Loss (LDWL) no independent evidence
    purpose: Emphasize rare samples simultaneously in latent and target spaces
    Novel loss function introduced as part of the algorithm-level balancing stage.

pith-pipeline@v0.9.1-grok · 5740 in / 1292 out tokens · 41283 ms · 2026-06-28T17:42:27.924762+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

108 extracted references · 60 canonical work pages

  1. [6]

    Journal of artificial intelligence research , volume=

    SMOTE: synthetic minority over-sampling technique , author=. Journal of artificial intelligence research , volume=. 2002 , doi =

  2. [7]

    , title =

    Branco, Paula and Torgo, Luís and Ribeiro, Rita P. , title =. 2017 , editor =

  3. [13]

    Data Augmentation with Variational Autoencoder for Imbalanced Dataset

    Stocksieker, Samuel and Pommeret, Denys and Charpentier, Arthur. Data Augmentation with Variational Autoencoder for Imbalanced Dataset. Neural Information Processing. 2025

  4. [15]

    and Ghoraani, Behnaz , journal=

    Hssayeni, Murtadha D. and Ghoraani, Behnaz , journal=. Deep Regression Modeling for Imbalanced and Incomplete Time-Series Data , year=

  5. [16]

    Advances in neural information processing systems , volume=

    Modeling tabular data using conditional gan , author=. Advances in neural information processing systems , volume=

  6. [22]

    IRMAE-AKDE: A Novel Solution to Deep Imbalanced Regression for Performance Prediction of Rolled Steel Plate , year=

    Zhang, Yufei and Zhan, Chenlu and Peng, Gongzhuang and Wang, Hongwei , journal=. IRMAE-AKDE: A Novel Solution to Deep Imbalanced Regression for Performance Prediction of Rolled Steel Plate , year=

  7. [23]

    Journal of Artificial Intelligence Research , volume=

    A Selective Under-Sampling (SUS) Method For Imbalanced Regression , author=. Journal of Artificial Intelligence Research , volume=. 2025 , url =

  8. [26]

    Spatial Distribution-Based Imbalanced Undersampling , year=

    Yan, Yuanting and Zhu, Yuanwei and Liu, Ruiqing and Zhang, Yiwen and Zhang, Yanping and Zhang, Ling , journal=. Spatial Distribution-Based Imbalanced Undersampling , year=

  9. [30]

    2018 , publisher=

    Learning from imbalanced data sets , author=. 2018 , publisher=

  10. [35]

    Proceedings of the 38th International Conference on Machine Learning , pages =

    Delving into Deep Imbalanced Regression , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

  11. [36]

    2022 , editor =

    Gong, Yu and Mori, Greg and Tung, Fred , booktitle =. 2022 , editor =

  12. [37]

    Rank-N-Contrast: Learning Continuous Representations for Regression , url =

    Zha, Kaiwen and Cao, Peng and Son, Jeany and Yang, Yuzhe and Katabi, Dina , booktitle =. Rank-N-Contrast: Learning Continuous Representations for Regression , url =

  13. [39]

    2025 , publisher=

    Ensemble methods: foundations and algorithms , author=. 2025 , publisher=

  14. [42]

    Sensors , VOLUME =

    Isabona, Joseph and Imoize, Agbotiname Lucky and Kim, Yongsung , TITLE =. Sensors , VOLUME =. 2022 , NUMBER =

  15. [43]

    Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications , pages =

    Evaluation of Ensemble Methods in Imbalanced Regression Tasks , author =. Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications , pages =. 2017 , editor =

  16. [44]

    Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications , pages =

    REBAGG: REsampled BAGGing for Imbalanced Regression , author =. Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications , pages =. 2018 , editor =

  17. [45]

    SMOTEBoost for Regression: Improving the Prediction of Extreme Values , year=

    Moniz, Nuno and Ribeiro, Rita and Cerqueira, Vitor and Chawla, Nitesh , booktitle=. SMOTEBoost for Regression: Improving the Prediction of Extreme Values , year=

  18. [49]

    Information Sciences , year=

    Sun, Jie and Lang, Jie and Fujita, Hamido and Li, Hui , title=. Information Sciences , year=

  19. [57]

    Scheepens, D. R. and Schicker, I. and Hlav\'a. Adapting a deep convolutional RNN model with imbalanced regression loss for improved spatio-temporal forecasting of extreme wind speed events in the short to medium range , JOURNAL =. 2023 , NUMBER =

  20. [58]

    A Novel Continuous Blood Pressure Estimation Approach Based on Data Mining Techniques , year=

    Miao, Fen and Fu, Nan and Zhang, Yuan-Ting and Ding, Xiao-Rong and Hong, Xi and He, Qingyun and Li, Ye , journal=. A Novel Continuous Blood Pressure Estimation Approach Based on Data Mining Techniques , year=

  21. [59]

    Icml , volume=

    Addressing the curse of imbalanced training sets: one-sided selection , author=. Icml , volume=. 1997 , organization=

  22. [61]

    Water , VOLUME =

    Candelieri, Antonio , TITLE =. Water , VOLUME =. 2017 , NUMBER =

  23. [62]

    Machine Learning for anomaly detection

    Hajjami, Salma El and Malki, Jamal and Berrada, Mohammed and Fourka, Bouziane , booktitle=. Machine Learning for anomaly detection. Performance study considering anomaly distribution in an imbalanced dataset , year=

  24. [63]

    Energies , VOLUME =

    Lucas, Alexandre and Pegios, Konstantinos and Kotsakis, Evangelos and Clarke, Dan , TITLE =. Energies , VOLUME =. 2020 , NUMBER =

  25. [67]

    Low-Dimensional Representation Learning from Imbalanced Data Streams

    Korycki, ukasz and Krawczyk, Bartosz. Low-Dimensional Representation Learning from Imbalanced Data Streams. Advances in Knowledge Discovery and Data Mining. 2021

  26. [69]

    kdd , volume=

    A density-based algorithm for discovering clusters in large spatial databases with noise , author=. kdd , volume=. 1996 , doi=

  27. [70]

    Comparative Analysis Review of Pioneering DBSCAN and Successive Density-Based Clustering Algorithms , year=

    Bushra, Adil Abdu and Yi, Gangman , journal=. Comparative Analysis Review of Pioneering DBSCAN and Successive Density-Based Clustering Algorithms , year=

  28. [71]

    Isolation Forest , year=

    Liu, Fei Tony and Ting, Kai Ming and Zhou, Zhi-Hua , booktitle=. Isolation Forest , year=

  29. [72]

    and Hassanat, Ahmad B

    Tarawneh, Ahmad S. and Hassanat, Ahmad B. and Altarawneh, Ghada Awad and Almuhaimeed, Abdullah , journal=. Stop Oversampling for Class Imbalance Learning: A Review , year=

  30. [74]

    A Comprehensive Survey of Regression-Based Loss Functions for Time Series Forecasting

    Jadon, Aryan and Patil, Avinash and Jadon, Shruti. A Comprehensive Survey of Regression-Based Loss Functions for Time Series Forecasting. Data Management, Analytics and Innovation. 2024

  31. [75]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Ren, Wenqi and Ma, Lin and Zhang, Jiawei and Pan, Jinshan and Cao, Xiaochun and Liu, Wei and Yang, Ming-Hsuan , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2018 , url =

  32. [76]

    Learning Structured Output Representation using Deep Conditional Generative Models , url =

    Sohn, Kihyuk and Lee, Honglak and Yan, Xinchen , booktitle =. Learning Structured Output Representation using Deep Conditional Generative Models , url =

  33. [78]

    , author Krawczyk, B

    author Aguiar, G. , author Krawczyk, B. , author Cano, A. , year 2024 . title A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework . journal Machine Learning volume 113 , pages 4165--4243 . https://doi.org/10.1007/s10994-023-06353-6, :10.1007/s10994-023-06353-6

  34. [79]

    , author Garc \' a-Remesal, M

    author Aleksic, J. , author Garc \' a-Remesal, M. , year 2025 . title A selective under-sampling (sus) method for imbalanced regression . journal Journal of Artificial Intelligence Research volume 82 , pages 111--136 . https://www.jair.org/index.php/jair/article/view/16062, :https://doi.org/10.1613/jair.1.16062

  35. [81]

    , author Idri, A

    author Araf, I. , author Idri, A. , author Chairi, I. , year 2024 b. title Cost-sensitive learning for imbalanced medical data: a review . journal Artificial Intelligence Review volume 57 , pages 80 . https://doi.org/10.1007/s10462-023-10652-8, :10.1007/s10462-023-10652-8

  36. [82]

    , author Cavalcanti, G.D.C

    author Avelino, J.G. , author Cavalcanti, G.D.C. , author Cruz, R.M.O. , year 2024 . title Resampling strategies for imbalanced regression: a survey and empirical analysis . journal Artificial Intelligence Review volume 57 , pages 82 . https://doi.org/10.1007/s10462-024-10724-3, :10.1007/s10462-024-10724-3

  37. [83]

    , author Cavalcanti, G.D.C

    author Avelino, J.G. , author Cavalcanti, G.D.C. , author Cruz, R.M.O. , year 2025 . title Imbalanced regression pipeline recommendation . journal Machine Learning volume 114 , pages 146 . https://doi.org/10.1007/s10994-025-06766-5, :10.1007/s10994-025-06766-5

  38. [84]

    , author Islam, A

    author Belhaouari, S.B. , author Islam, A. , author Kassoul, K. , author Al-Fuqaha, A. , author Bouzerdoum, A. , year 2024 . title Oversampling techniques for imbalanced data in regression . journal Expert Systems with Applications volume 252 , pages 124118 . https://www.sciencedirect.com/science/article/pii/S0957417424009849, :https://doi.org/10.1016/j.e...

  39. [85]

    , author Torgo, L

    author Branco, P. , author Torgo, L. , author Ribeiro, R.P. , year 2017 . title SMOGN : a pre-processing approach for imbalanced regression , publisher PMLR . pp. pages 36--50 . https://proceedings.mlr.press/v74/branco17a.html

  40. [86]

    , author Torgo, L

    author Branco, P. , author Torgo, L. , author Ribeiro, R.P. , year 2018 . title Rebagg: Resampled bagging for imbalanced regression , in: editor Torgo, L. , editor Matwin, S. , editor Japkowicz, N. , editor Krawczyk, B. , editor Moniz, N. , editor Branco, P. (Eds.), booktitle Proceedings of the Second International Workshop on Learning with Imbalanced Dom...

  41. [87]

    , author Torgo, L

    author Branco, P. , author Torgo, L. , author Ribeiro, R.P. , year 2019 . title Pre-processing approaches for imbalanced distributions in regression . journal Neurocomputing volume 343 , pages 76--99 . https://www.sciencedirect.com/science/article/pii/S0925231219301638, :https://doi.org/10.1016/j.neucom.2018.11.100. note learning in the Presence of Class ...

  42. [88]

    , author Yi, G

    author Bushra, A.A. , author Yi, G. , year 2021 . title Comparative analysis review of pioneering dbscan and successive density-based clustering algorithms . journal IEEE Access volume 9 , pages 87918--87935 . :10.1109/ACCESS.2021.3089036

  43. [89]

    , author Bacao, F

    author Camacho, L. , author Bacao, F. , year 2024 . title Wsmoter: a novel approach for imbalanced regression . journal Applied Intelligence volume 54 , pages 8789--8799 . https://doi.org/10.1007/s10489-024-05608-6, :10.1007/s10489-024-05608-6

  44. [90]

    , year 2017

    author Candelieri, A. , year 2017 . title Clustering and support vector regression for water demand forecasting and anomaly detection . journal Water volume 9 . https://www.mdpi.com/2073-4441/9/3/224, :10.3390/w9030224

  45. [91]

    , author Jia, M

    author Cao, Y. , author Jia, M. , author Zhao, X. , author Yan, X. , author Feng, K. , year 2024 . title Cost-sensitive learning considering label and feature distribution consistency: A novel perspective for health prognosis of rotating machinery with imbalanced data . journal Expert Systems with Applications volume 250 , pages 123930 . https://www.scien...

  46. [92]

    , author Pinho, A.J

    author Carvalho, M. , author Pinho, A.J. , author Br \'a s, S. , year 2025 . title Resampling approaches to handle class imbalance: a review from a data perspective . journal Journal of Big Data volume 12 , pages 71 . https://doi.org/10.1186/s40537-025-01119-4, :10.1186/s40537-025-01119-4

  47. [93]

    , author Bowyer, K.W

    author Chawla, N.V. , author Bowyer, K.W. , author Hall, L.O. , author Kegelmeyer, W.P. , year 2002 . title Smote: synthetic minority over-sampling technique . journal Journal of artificial intelligence research volume 16 , pages 321--357 . https://www.jair.org/index.php/jair/article/view/10302, :https://doi.org/10.1613/jair.953

  48. [94]

    , author Lalor, J

    author Chen, J. , author Lalor, J. , author Liu, W. , author Druhl, E. , author Granillo, E. , author Vimalananda, V.G. , author Yu, H. , year 2019 . title Detecting hypoglycemia incidents reported in patients' secure messages: Using cost-sensitive learning and oversampling to reduce data imbalance . journal J Med Internet Res volume 21 , pages e11990 . h...

  49. [95]

    , author Yang, K

    author Chen, W. , author Yang, K. , author Yu, Z. , author Shi, Y. , author Chen, C.L.P. , year 2024 . title A survey on imbalanced learning: latest research, applications and future directions . journal Artificial Intelligence Review volume 57 , pages 137 . https://doi.org/10.1007/s10462-024-10759-6, :10.1007/s10462-024-10759-6

  50. [96]

    , author Jacobson, K.N

    author Dablain, D. , author Jacobson, K.N. , author Bellinger, C. , author Roberts, M. , author Chawla, N.V. , year 2024 . title Understanding cnn fragility when learning with imbalanced data . journal Machine Learning volume 113 , pages 4785--4810 . https://doi.org/10.1007/s10994-023-06326-9, :10.1007/s10994-023-06326-9

  51. [97]

    , author Jia, M

    author Ding, Y. , author Jia, M. , author Zhuang, J. , author Ding, P. , year 2022 . title Deep imbalanced regression using cost-sensitive learning and deep feature transfer for bearing remaining useful life estimation . journal Applied Soft Computing volume 127 , pages 109271 . https://www.sciencedirect.com/science/article/pii/S1568494622004732, :https:/...

  52. [98]

    , author Chen, J

    author Dolar, T. , author Chen, J. , author Chen, W. , year 2025 . title Uncertainty quantification driven machine learning for improving model accuracy in imbalanced regression tasks . journal Expert Systems with Applications volume 261 , pages 125526 . https://www.sciencedirect.com/science/article/pii/S0957417424023935, :https://doi.org/10.1016/j.eswa.2...

  53. [99]

    , author Kriegel, H.P

    author Ester, M. , author Kriegel, H.P. , author Sander, J. , author Xu, X. , et al., year 1996 . title A density-based algorithm for discovering clusters in large spatial databases with noise , in: booktitle kdd , pp. pages 226--231 . :https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.9220

  54. [100]

    , author Liu, Z

    author Feng, C. , author Liu, Z. , author Li, W. , author Lu, X. , author Jing, Y. , author Ma, Y. , year 2025 . title Improved gaussian mixture model and gaussian mixture regression for learning from demonstration based on gaussian noise scattering . journal Advanced Engineering Informatics volume 65 , pages 103192 . https://www.sciencedirect.com/science...

  55. [101]

    , author Garc \' a, S

    author Fern \'a ndez, A. , author Garc \' a, S. , author Galar, M. , author Prati, R.C. , author Krawczyk, B. , author Herrera, F. , year 2018 . title Learning from imbalanced data sets . volume volume 10 . publisher Springer . https://link.springer.com/book/10.1007/978-3-319-98074-4

  56. [102]

    , author Liu, G

    author Gong, Y. , author Liu, G. , author Xue, Y. , author Li, R. , author Meng, L. , year 2023 . title A survey on dataset quality in machine learning . journal Information and Software Technology volume 162 , pages 107268 . https://www.sciencedirect.com/science/article/pii/S0950584923001222, :https://doi.org/10.1016/j.infsof.2023.107268

  57. [103]

    , author Mori, G

    author Gong, Y. , author Mori, G. , author Tung, F. , year 2022 . title R ank S im: Ranking similarity regularization for deep imbalanced regression , in: editor Chaudhuri, K. , editor Jegelka, S. , editor Song, L. , editor Szepesvari, C. , editor Niu, G. , editor Sabato, S. (Eds.), booktitle Proceedings of the 39th International Conference on Machine Lea...

  58. [104]

    , author Singh, A.K

    author Goswami, S. , author Singh, A.K. , year 2024 . title A literature survey on various aspect of class imbalance problem in data mining . journal Multimedia Tools and Applications volume 83 , pages 70025--70050 . https://doi.org/10.1007/s11042-024-18244-6, :10.1007/s11042-024-18244-6

  59. [105]

    , author Malki, J

    author Hajjami, S.E. , author Malki, J. , author Berrada, M. , author Fourka, B. , year 2020 . title Machine learning for anomaly detection. performance study considering anomaly distribution in an imbalanced dataset , in: booktitle 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech) ...

  60. [106]

    , author Ghoraani, B

    author Hssayeni, M.D. , author Ghoraani, B. , year 2024 . title Deep regression modeling for imbalanced and incomplete time-series data . journal IEEE Transactions on Emerging Topics in Computational Intelligence volume 8 , pages 3767--3778 . :10.1109/TETCI.2024.3372435

  61. [107]

    , author Liu, D.R

    author Huang, Y. , author Liu, D.R. , author Lee, S.J. , author Hsu, C.H. , author Liu, Y.G. , year 2022 . title A boosting resampling method for regression based on a conditional variational autoencoder . journal Information Sciences volume 590 , pages 90--105 . https://www.sciencedirect.com/science/article/pii/S0020025521013207, :https://doi.org/10.1016...

  62. [108]

    , author Imoize, A.L

    author Isabona, J. , author Imoize, A.L. , author Kim, Y. , year 2022 . title Machine learning-based boosted regression ensemble combined with hyperparameter tuning for optimal adaptive learning . journal Sensors volume 22 . https://www.mdpi.com/1424-8220/22/10/3776, :10.3390/s22103776

  63. [109]

    , author Patil, A

    author Jadon, A. , author Patil, A. , author Jadon, S. , year 2024 . title A comprehensive survey of regression-based loss functions for time series forecasting , in: editor Sharma, N. , editor Goje, A.C. , editor Chakrabarti, A. , editor Bruckstein, A.M. (Eds.), booktitle Data Management, Analytics and Innovation , publisher Springer Nature Singapore , a...

  64. [110]

    , author Pannu, H.S

    author Kaur, H. , author Pannu, H.S. , author Malhi, A.K. , year 2019 . title A systematic review on imbalanced data challenges in machine learning: Applications and solutions . journal ACM Comput. Surv. volume 52 . https://doi.org/10.1145/3343440, :10.1145/3343440

  65. [111]

    , author Chaudhari, O

    author Khan, A.A. , author Chaudhari, O. , author Chandra, R. , year 2024 . title A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation . journal Expert Systems with Applications volume 244 , pages 122778 . https://www.sciencedirect.com/science/article/pii/S0957417423032803, :h...

  66. [112]

    , author Siriborvornratanakul, T

    author Komsrimorakot, P. , author Siriborvornratanakul, T. , year 2025 . title Enhancing fraud detection in imbalanced motor insurance datasets using cp-smote and random under-sampling . journal Journal of Big Data volume 12 , pages 172 . https://doi.org/10.1186/s40537-025-01217-3, :10.1186/s40537-025-01217-3

  67. [113]

    , author Krawczyk, B

    author Korycki, . , author Krawczyk, B. , year 2021 . title Low-dimensional representation learning from imbalanced data streams , in: editor Karlapalem, K. , editor Cheng, H. , editor Ramakrishnan, N. , editor Agrawal, R.K. , editor Reddy, P.K. , editor Srivastava, J. , editor Chakraborty, T. (Eds.), booktitle Advances in Knowledge Discovery and Data Min...

  68. [114]

    , year 2020

    author Koziarski, M. , year 2020 . title Radial-based undersampling for imbalanced data classification . journal Pattern Recognition volume 102 , pages 107262 . https://www.sciencedirect.com/science/article/pii/S0031320320300674, :https://doi.org/10.1016/j.patcog.2020.107262

  69. [115]

    , author Matwin, S

    author Kubat, M. , author Matwin, S. , et al., year 1997 . title Addressing the curse of imbalanced training sets: one-sided selection , in: booktitle Icml , organization Citeseer . p. pages 179

  70. [116]

    , author Podpe c an, V

    author Lavra c , N. , author Podpe c an, V. , author Robnik- S ikonja, M. , year 2021 . title Representation learning . publisher Springer . https://link.springer.com/book/10.1007/978-3-030-68817-2#back-to-top, :https://doi.org/10.1007/978-3-030-68817-2

  71. [117]

    , author Li, W

    author Li, X. , author Li, W. , author Yu, X. , author Han, Z. , author Jin, Q. , year 2025 . title Financial risk assessment of imbalanced data based on nonlinear causal time-series network . journal Information Processing and Management volume 62 , pages 104025 . https://www.sciencedirect.com/science/article/pii/S0306457324003844, :https://doi.org/10.10...

  72. [118]

    , author Jin, J

    author Li, Y. , author Jin, J. , author Ma, J. , author Zhu, F. , author Jin, B. , author Liang, J. , author Philip Chen , C. , year 2023 . title Imbalanced least squares regression with adaptive weight learning . journal Information Sciences volume 648 , pages 119541 . https://www.sciencedirect.com/science/article/pii/S002002552301126X, :https://doi.org/...

  73. [119]

    , year 2022

    author Liu, J. , year 2022 . title Importance-smote: a synthetic minority oversampling method for noisy imbalanced data . journal Soft Computing volume 26 , pages 1141--1163 . https://doi.org/10.1007/s00500-021-06532-4, :10.1007/s00500-021-06532-4

  74. [120]

    , author Pegios, K

    author Lucas, A. , author Pegios, K. , author Kotsakis, E. , author Clarke, D. , year 2020 . title Price forecasting for the balancing energy market using machine-learning regression . journal Energies volume 13 . https://www.mdpi.com/1996-1073/13/20/5420, :10.3390/en13205420

  75. [121]

    , author Soares, C

    author Mendes-Moreira, J.a. , author Soares, C. , author Jorge, A.M. , author Sousa, J.F.D. , year 2012 . title Ensemble approaches for regression: A survey . journal ACM Comput. Surv. volume 45 . https://doi.org/10.1145/2379776.2379786, :10.1145/2379776.2379786

  76. [122]

    , author Fu, N

    author Miao, F. , author Fu, N. , author Zhang, Y.T. , author Ding, X.R. , author Hong, X. , author He, Q. , author Li, Y. , year 2017 . title A novel continuous blood pressure estimation approach based on data mining techniques . journal IEEE Journal of Biomedical and Health Informatics volume 21 , pages 1730--1740 . :10.1109/JBHI.2017.2691715

  77. [123]

    , author Branco, P

    author Moniz, N. , author Branco, P. , author Torgo, L. , year 2017 . title Evaluation of ensemble methods in imbalanced regression tasks , in: editor Luís Torgo, P.B. , editor Moniz, N. (Eds.), booktitle Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications , publisher PMLR . pp. pages 129--140 . htt...

  78. [124]

    , author Ribeiro, R

    author Moniz, N. , author Ribeiro, R. , author Cerqueira, V. , author Chawla, N. , year 2018 . title Smoteboost for regression: Improving the prediction of extreme values , in: booktitle 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) , pp. pages 150--159 . :10.1109/DSAA.2018.00025

  79. [125]

    , year 2020

    author Ohno, H. , year 2020 . title Auto-encoder-based generative models for data augmentation on regression problems . journal Soft Computing volume 24 , pages 7999--8009 . https://doi.org/10.1007/s00500-019-04094-0, :10.1007/s00500-019-04094-0

  80. [126]

    , author Grinberg, N.F

    author Orhobor, O.I. , author Grinberg, N.F. , author Soldatova, L.N. , author King, R.D. , year 2023 . title Imbalanced regression using regressor-classifier ensembles . journal Machine Learning volume 112 , pages 1365--1387 . https://doi.org/10.1007/s10994-022-06199-4, :10.1007/s10994-022-06199-4

Showing first 80 references.