pith. machine review for the scientific record. sign in

arxiv: 1810.11363 · v1 · submitted 2018-10-24 · 💻 cs.LG · cs.MS· stat.ML

Recognition: unknown

CatBoost: gradient boosting with categorical features support

Authors on Pith no claims yet
classification 💻 cs.LG cs.MSstat.ML
keywords boostinggradientalgorithmavailablecatboostcategoricalfeaturesimplementation
0
0 comments X
read the original abstract

In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully handles categorical features and outperforms existing publicly available implementations of gradient boosting in terms of quality on a set of popular publicly available datasets. The library has a GPU implementation of learning algorithm and a CPU implementation of scoring algorithm, which are significantly faster than other gradient boosting libraries on ensembles of similar sizes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Context-Aware Web Attack Detection in Open-Source SIEM Systems via MITRE ATT&CK-Enriched Behavioral Profiling

    cs.CR 2026-05 conditional novelty 5.0

    Smart-SIEM adds context-aware ML profiling to Wazuh SIEM, lifting binary attack detection F1 to 0.967 and six-class categorization to 0.914 while recovering from concept drift via retraining.

  2. Comparative analysis of missing data imputation methods for CSST survey: Impact on photometric redshift estimation performance

    astro-ph.GA 2026-05 conditional novelty 5.0

    KNN imputation gives highest photo-z accuracy under ideal random missingness with complete training data, while SAITS is more robust for incomplete training sets and realistic mixed missingness patterns in CSST data.

  3. Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models

    cs.AI 2026-05 unverdicted novelty 5.0

    The synthetic prior for tabular foundation models covers only a narrow part of real table distributions, but this mismatch does not degrade model generalization.

  4. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    cs.LG 2021-07 unverdicted novelty 5.0

    Pith review generated a malformed one-line summary.

  5. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

    stat.ML 2020-03 unverdicted novelty 5.0

    AutoGluon-Tabular achieves superior accuracy on tabular classification and regression by multi-layer model ensembling and stacking, outperforming other AutoML frameworks on 50 benchmarks and Kaggle competitions.

  6. Search for quasar pairs with Gaia astrometric data. II. Photometric redshift prediction with machine learning for the MGQPC catalogue

    astro-ph.GA 2026-05 conditional novelty 4.0

    Machine learning models achieve NMAD 0.036 and 5.6% outliers for quasar photometric redshifts, identifying 185 high-probability pair candidates in MGQPC with 20 spectroscopically confirmed as physical pairs.

  7. Donor-Aware scRNA-seq Benchmarks for IBD Classification

    q-bio.QM 2026-05 unverdicted novelty 4.0

    Donor-aware benchmarks show AUROCs up to 0.978 for IBD classification from scRNA-seq using CLR cell-type compositions and GatedStructuralCFN embeddings, with compartment stratification improving both performance and f...

  8. From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

    cs.LG 2026-04 unverdicted novelty 4.0

    Hybrid predictive modeling of crash data identifies non-use of restraints as the primary risk factor for severe injury in collisions involving trees.