pith. machine review for the scientific record. sign in

arxiv: 1603.02754 · v3 · submitted 2016-03-09 · 💻 cs.LG

Recognition: unknown

XGBoost: A Scalable Tree Boosting System

Authors on Pith no claims yet
classification 💻 cs.LG
keywords treeboostingdatalearningscalablesystemxgboostinsights
0
0 comments X
read the original abstract

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Observation of the charmless purely baryonic decay $\mathinner{\mathit{\Lambda}^0_b\!\to \mathit{\Lambda} p \overline{p}}$

    hep-ex 2026-05 conditional novelty 8.0

    First observation of Λ_b^0 → Λ p p-bar with 5.1σ significance and relative branching fraction (5.1 ± 1.3(stat) ± 0.3(syst)) × 10^{-2} to the reference mode Λ_b^0 → Λ K^+ K^-.

  2. A satellite foundation model for improved wealth monitoring

    cs.CY 2026-04 unverdicted novelty 7.0

    Tempov is a self-supervised satellite foundation model that predicts wealth levels and decadal changes at high resolution across Africa from Landsat imagery, outperforming baselines even with limited labels and genera...

  3. SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

    cs.DB 2026-04 unverdicted novelty 7.0

    SynQL synthesizes diverse, execution-ready SQL workloads by deterministically traversing foreign-key graphs to populate ASTs, yielding high topological entropy and cost-model training data with R² ≥ 0.79 on held-out sets.

  4. Identifying Changing-Look AGN Transitions in Light Curve Data with the Zwicky Transient Facility

    astro-ph.GA 2026-04 unverdicted novelty 6.0

    A criterion of |Δg| > 0.4 mag and |Δ(g-r)| > 0.2 mag detects photometric CL-AGN transitions in 9.6% of known hosts with 1.6% false positive rate from simulations.

  5. Search for pair production of additional neutral scalars within the Inert Doublet Model in a final state with two electrons or two muons in proton-proton collisions at $\sqrt{s}$ = 13 TeV and 13.6 TeV

    hep-ex 2026-05 accept novelty 5.0

    No significant excess found; new exclusion limits reach m_H = 108 GeV for m_H - m_A = 78 GeV in the Inert Doublet Model.

  6. From Gaia to GaiaNIR: II. A new view of the Milky Way bar

    astro-ph.GA 2026-04 unverdicted novelty 5.0

    Gaia DR3 data shows the Milky Way bar pattern speed is biased high by 14.4 km s^{-1} kpc^{-1}, with a bias-corrected estimate of 29.3 ± 2.3 km s^{-1} kpc^{-1}.

  7. Predicting Redshift in Seyfert Galaxies Using Machine Learning

    astro-ph.GA 2026-04 conditional novelty 4.0

    Random Forest regression on combined optical plus mid-infrared colors yields NMAD of 0.0188, R-squared of 0.9561, and 0.294 percent outliers for photometric redshifts in 23,797 Seyfert II galaxies selected from SDSS and WISE.

  8. AMO-ENE: Attention-based Multi-Omics Fusion Model for Outcome Prediction in Extra Nodal Extension and HPV-associated Oropharyngeal Cancer

    eess.IV 2026-04 unverdicted novelty 4.0

    An attention-based fusion model combining semi-supervised CT segmentation, radiomics, and clinical features predicts metastatic recurrence, overall survival, and disease-free survival in HPV+ oropharyngeal cancer with...

  9. Exotic Higgs Decays at a Muon Collider

    hep-ph 2026-04 unverdicted novelty 4.0

    Muon colliders at 3 TeV and 10 TeV can probe branching ratios for h to SS decays in 4b and 2b2μ channels down to 10^{-3}–10^{-5}, improving on HL-LHC projections using machine learning.

  10. Probing Heavy Neutral Higgs Bosons via Single Vector-Like Bottom Quark Production at the HL-LHC

    hep-ph 2026-03 unverdicted novelty 4.0

    XGBoost multivariate analysis extends the 5-sigma discovery reach for singly produced vector-like bottom quarks decaying via heavy neutral Higgs bosons to 1.6 TeV at the HL-LHC with 3 ab^{-1}.

  11. Prospects for Measuring $H\to \rm{invisble}$ at the FCCee

    hep-ph 2026-05 unverdicted novelty 3.0

    FCC-ee at 240 GeV with 10.8 ab^{-1} could set a 95% CL upper limit of 0.15% on the branching ratio of Higgs to invisible particles.

  12. Probing the electron Yukawa coupling via resonant Higgs boson production at FCC-ee via $e^+e^- \to H \to WW^*$ in lepton-plus-jets final states

    hep-ph 2026-04 unverdicted novelty 3.0

    Simulation projects 2.0 sigma significance for resonant Higgs production at FCC-ee, yielding an upper limit of kappa_e less than or equal to 1.35 at 95% CL on the electron Yukawa coupling modifier with 10 ab inverse l...

  13. Machine Learning Study on Single Production of a Singlet Vector-like Lepton at the Large Hadron Collider

    hep-ph 2026-04 unverdicted novelty 3.0

    XGBoost machine learning improves discrimination in LHC searches for singlet vector-like leptons, yielding projected 2σ mass exclusion limits of 620 GeV (three-lepton) and 490 GeV (four-lepton) at 14 TeV with 3000 fb^{-1}.