{"work":{"id":"c12cf8a8-1951-412c-a3cd-92de4cb10ca7","openalex_id":null,"doi":"10.1145/2939672.2939785","arxiv_id":null,"raw_key":"raw:ed14ad477173c858a5ec857a","title":"Xgboost: A scalable tree boosting system","authors":[{"given":"Tianqi","family":"Chen","sequence":"first","affiliation":[{"name":"University of Washington, Seattle, WA, USA"}]},{"given":"Carlos","family":"Guestrin","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA, USA"}]}],"authors_text":"Tianqi Chen and Carlos Guestrin","year":2016,"venue":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","abstract":null,"external_url":"https://doi.org/10.1145/2939672.2939785","cited_by_count":41018,"metadata_source":"raw_reference","metadata_fetched_at":"2026-05-26T12:07:16.939530+00:00","pith_arxiv_id":null,"created_at":"2026-05-12T19:46:49.054893+00:00","updated_at":"2026-05-26T12:07:16.939530+00:00","title_quality_ok":true,"display_title":"Xgboost: A scalable tree boosting system","render_title":"Xgboost: A scalable tree boosting system"},"hub":{"state":{"work_id":"c12cf8a8-1951-412c-a3cd-92de4cb10ca7","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":16,"external_cited_by_count":41018,"distinct_field_count":6,"first_pith_cited_at":"2025-10-12T05:24:32+00:00","last_pith_cited_at":"2026-05-18T18:00:45+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-01T21:33:54.047780+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"baseline","n":4},{"context_role":"method","n":2}],"polarity_counts":[{"context_polarity":"baseline","n":4},{"context_polarity":"use_method","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Xgboost: A scalable tree boosting system","claims":[{"claim_text":"a: Single-span bridges do not have a bent and are not included in the ML-based predictions. b: There is a small proportion of bridges that have more than 7 columns per bent. These bridges are considered as outliers and have been excluded before the model training. Figure 7: Proposed classifier chain for imputing missing attributes. An XGBoost classifier [40] is used as the predictive model to impute the four target attributes. The hyperparameters of each classifier are tuned using Bayesian optim","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Despite their efficiency, such adaptation can be unstable and prone to overfitting when supervision is scarce [48, 10, 23], highlighting the need for more data-efficient and robust fine-tuning strategies. Interestingly, the tabular learning community has long relied on a different paradigm to address similar challenges: gradient boosting. Systems such as XGBoost [ 5], LightGBM [20], and CatBoost [35] consistently achieve strong performance across a wide range of tabular tasks and are known to be","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"wi for each paper by log-normalizing and aggregating its GitHub stars, citation counts, influential citations, and Altmetric score. The distribution of the four log-normalized ground-truth impact metrics utilized in the dataset is shown in Figure 4. Baselines.We benchmark FAME against three distinct categories of evaluators. First, we evaluate ML models, including XGBoost [9], SVR [11, 27], Transformer [31] and TGCN [39], trained directly 5 Table 1: Prospective forecasting performance across an ","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Extreme wildfires are challenging to predict [41], as they emerge from the complex interplay of fire weather [9, 10, 37], topography [33], vegetation fuels [32, 37], and human factors such as ignition and fire suppression [16, 19, 26, 44], all of which are difficult to fully represent in process-based wildfire models. Whereas machine learning (ML) approaches such as XGBoost [8] have shown promise in wildfire prediction [5, 18, 21, 25, 42], outperforming process-based wildfire models [41], they t","claim_type":"method","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Xgboost: A scalable tree boosting system because it crossed a citation-hub threshold. Current citing contexts most often use it as baseline evidence (2 contexts).","role_counts":[{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"}]},"error":null,"updated_at":"2026-05-19T03:31:23.438327+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"a9e3ca51-d13e-40d6-9fff-7d3d53ad8b1c","orcid":null,"display_name":"Tianqi Chen"},{"id":"5ee1a1f4-ade6-444d-a2ba-aa84c735be28","orcid":null,"display_name":"Carlos Guestrin"}]},"error":null,"updated_at":"2026-05-19T03:31:27.815422+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-19T03:31:30.273534+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"arXiv:2506.16791 [cs]","work_id":"155b5349-dee8-4870-965c-d54a700a19de","shared_citers":4},{"title":"arXiv preprint arXiv:2410.18164 , year=","work_id":"6dea29bf-5c1e-4430-8f09-25c430201a44","shared_citers":3},{"title":"Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018","work_id":"cb38f37f-2ff1-4119-9239-170112c586b9","shared_citers":3},{"title":"Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems, 30","work_id":"a9b1bdee-027d-4982-80a9-f2f6ac2c4858","shared_citers":3},{"title":"TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second","work_id":"2b2cf99d-d6e4-4a96-9e61-f783356a2581","shared_citers":3},{"title":"AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning","work_id":"5ca62627-18c8-4130-9e2d-a026a3d530d3","shared_citers":2},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":2},{"title":"Adapting tabpfn for zero-inflated metagenomic data","work_id":"be255465-64f6-4bec-abb6-1cd96199fd74","shared_citers":2},{"title":"Adjuster this! tabpfn for solar forecast error adjustment.https://gist.github","work_id":"b67c6309-3b49-4c67-a3a6-6a59e0c924f4","shared_citers":2},{"title":"Advanced deep learning enables prediction of allogeneic stem cell mobilization success","work_id":"d79f1364-81fd-4719-a64f-64be43e86ecc","shared_citers":2},{"title":"Advancing biogeographical ancestry predictions through machine learning","work_id":"6ac91f9f-80a6-4c79-9209-427dd40c32be","shared_citers":2},{"title":"Alzakari, Abdullah Aldrees, Muhammad Fahad Umer, Luca Cascone, Nader Innab, and Imran Ashraf","work_id":"3c89cf3e-5863-41f9-9b01-7a421c0e28d1","shared_citers":2},{"title":"A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy.Asian Spine Journal, 18(4):541–549, 2024","work_id":"861d621f-e198-4c91-94e2-ee05e330dfaa","shared_citers":2},{"title":"An rf-tabpfn-based framework for few-shot iot network attack recognition using lasso-rfe feature selection.IEEE Access, 13:151452–151465,","work_id":"8175d85b-f000-4633-aa25-65c12165f5e7","shared_citers":2},{"title":"Application of machine learning in caisson inclination prediction: Model performance comparison and interpretability analysis.Underground Space, 2025","work_id":"510d7df9-ba83-4771-9022-f7626c0a7941","shared_citers":2},{"title":"Application of tabpfn model on the energy performance improvement of high-power multistage centrifugal pump.Energy, 2025","work_id":"4f0bfbd2-2c08-48c8-bb98-63b192f88144","shared_citers":2},{"title":"Artificial intelligence for predicting post-excision recurrence and malignant progression in oral potentially malignant disorders: a retrospective cohort study","work_id":"8cc63eb8-22fe-4136-a5ee-459aa8d2681c","shared_citers":2},{"title":"arXiv preprint arXiv:2506.10914 , year=","work_id":"7236d2c7-404d-4193-a35f-ebf5175d2048","shared_citers":2},{"title":"A target-specific machine learning framework for predicting fuel blend properties","work_id":"319482c7-3b08-4c39-9727-a827230b5d80","shared_citers":2},{"title":"Attention is all you need.Advances in neural information processing systems, 30","work_id":"751efe07-5e91-415c-b3d1-f4734aa26960","shared_citers":2},{"title":"Attention is all you need.Advances in neural information processing systems, 30, 2017","work_id":"869861e6-4680-4369-bac3-12785d201be1","shared_citers":2},{"title":"Autoenergy: An automated feature engineering algorithm for energy consumption forecasting with automl.Knowledge-Based Systems, 2025","work_id":"79d71a76-edcf-4bd5-867b-6d678c2fdd5f","shared_citers":2},{"title":"AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data","work_id":"32ca4e6c-bd72-4586-8594-40eb6bcb6582","shared_citers":2},{"title":"Automated supervised identification of thunderstorm ground enhancements (tges).arXiv preprint arXiv:2510.25125, 2025","work_id":"3ee77b7f-a718-48e2-b004-03d1ff8e56d6","shared_citers":2}],"time_series":[{"n":2,"year":2025},{"n":8,"year":2026}],"dependency_candidates":[{"n":1,"role":"method","polarity":"use_method","paper_title":"Environment-Adaptive Preference Optimization for Wildfire Prediction","primary_cat":"cs.LG","context_text":"Extreme wildfires are challenging to predict [41], as they emerge from the complex interplay of fire weather [9, 10, 37], topography [33], vegetation fuels [32, 37], and human factors such as ignition and fire suppression [16, 19, 26, 44], all of which are difficult to fully represent in process-based wildfire models. Whereas machine learning (ML) approaches such as XGBoost [8] have shown promise in wildfire prediction [5, 18, 21, 25, 42], outperforming process-based wildfire models [41], they typically require extensive historical fire data for training and may struggle to generalize to novel fire regimes that emerge under climate change [ 14]. As climate change causes shifts in the spatial pattern, seasonality, and statistical distribution of fire","citing_arxiv_id":"2605.12435"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution","primary_cat":"cs.LG","context_text":"wi for each paper by log-normalizing and aggregating its GitHub stars, citation counts, influential citations, and Altmetric score. The distribution of the four log-normalized ground-truth impact metrics utilized in the dataset is shown in Figure 4. Baselines.We benchmark FAME against three distinct categories of evaluators. First, we evaluate ML models, including XGBoost [9], SVR [11, 27], Transformer [31] and TGCN [39], trained directly 5 Table 1: Prospective forecasting performance across an 18-month sliding window evaluation from June 2024 to November 2025. Performance is measured by the Spearman rank correlationρs between predicted and ground-truth composite impact weights. The experiments are carried out three times,","citing_arxiv_id":"2605.07208"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Quantifying Exposure Information Uncertainty in Regional Risk Assessment","primary_cat":"stat.AP","context_text":"a: Single-span bridges do not have a bent and are not included in the ML-based predictions. b: There is a small proportion of bridges that have more than 7 columns per bent. These bridges are considered as outliers and have been excluded before the model training. Figure 7: Proposed classifier chain for imputing missing attributes. An XGBoost classifier [40] is used as the predictive model to impute the four target attributes. The hyperparameters of each classifier are tuned using Bayesian optimization [41], which is conducted within a stratified five-fold cross- validation framework. Early stopping is employed during the training by monitoring validation performance within each fold, such that the boosting process is terminated when no further improvement is observed after a specified number","citing_arxiv_id":"2605.08272"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification","primary_cat":"cs.LG","context_text":"Despite their efficiency, such adaptation can be unstable and prone to overfitting when supervision is scarce [48, 10, 23], highlighting the need for more data-efficient and robust fine-tuning strategies. Interestingly, the tabular learning community has long relied on a different paradigm to address similar challenges: gradient boosting. Systems such as XGBoost [ 5], LightGBM [20], and CatBoost [35] consistently achieve strong performance across a wide range of tabular tasks and are known to be particularly robust in data-limited settings [12, 13, 49, 8, 36]. Boosting constructs models in a stage-wise manner, where learners are added sequentially to correct the residual errors of previous ones. This additive training strategy encourages models to focus on informative residual errors","citing_arxiv_id":"2605.06117"}]},"error":null,"updated_at":"2026-05-19T03:31:27.836804+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-19T03:31:27.632235+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Xgboost: A scalable tree boosting system","claims":[{"claim_text":"a: Single-span bridges do not have a bent and are not included in the ML-based predictions. b: There is a small proportion of bridges that have more than 7 columns per bent. These bridges are considered as outliers and have been excluded before the model training. Figure 7: Proposed classifier chain for imputing missing attributes. An XGBoost classifier [40] is used as the predictive model to impute the four target attributes. The hyperparameters of each classifier are tuned using Bayesian optim","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Despite their efficiency, such adaptation can be unstable and prone to overfitting when supervision is scarce [48, 10, 23], highlighting the need for more data-efficient and robust fine-tuning strategies. Interestingly, the tabular learning community has long relied on a different paradigm to address similar challenges: gradient boosting. Systems such as XGBoost [ 5], LightGBM [20], and CatBoost [35] consistently achieve strong performance across a wide range of tabular tasks and are known to be","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"wi for each paper by log-normalizing and aggregating its GitHub stars, citation counts, influential citations, and Altmetric score. The distribution of the four log-normalized ground-truth impact metrics utilized in the dataset is shown in Figure 4. Baselines.We benchmark FAME against three distinct categories of evaluators. First, we evaluate ML models, including XGBoost [9], SVR [11, 27], Transformer [31] and TGCN [39], trained directly 5 Table 1: Prospective forecasting performance across an ","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Extreme wildfires are challenging to predict [41], as they emerge from the complex interplay of fire weather [9, 10, 37], topography [33], vegetation fuels [32, 37], and human factors such as ignition and fire suppression [16, 19, 26, 44], all of which are difficult to fully represent in process-based wildfire models. Whereas machine learning (ML) approaches such as XGBoost [8] have shown promise in wildfire prediction [5, 18, 21, 25, 42], outperforming process-based wildfire models [41], they t","claim_type":"method","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Xgboost: A scalable tree boosting system because it crossed a citation-hub threshold. Current citing contexts most often use it as baseline evidence (2 contexts).","role_counts":[{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"}]},"error":null,"updated_at":"2026-05-19T03:31:27.637761+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Xgboost: A scalable tree boosting system","claims":[{"claim_text":"a: Single-span bridges do not have a bent and are not included in the ML-based predictions. b: There is a small proportion of bridges that have more than 7 columns per bent. These bridges are considered as outliers and have been excluded before the model training. Figure 7: Proposed classifier chain for imputing missing attributes. An XGBoost classifier [40] is used as the predictive model to impute the four target attributes. The hyperparameters of each classifier are tuned using Bayesian optim","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Despite their efficiency, such adaptation can be unstable and prone to overfitting when supervision is scarce [48, 10, 23], highlighting the need for more data-efficient and robust fine-tuning strategies. Interestingly, the tabular learning community has long relied on a different paradigm to address similar challenges: gradient boosting. Systems such as XGBoost [ 5], LightGBM [20], and CatBoost [35] consistently achieve strong performance across a wide range of tabular tasks and are known to be","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"wi for each paper by log-normalizing and aggregating its GitHub stars, citation counts, influential citations, and Altmetric score. The distribution of the four log-normalized ground-truth impact metrics utilized in the dataset is shown in Figure 4. Baselines.We benchmark FAME against three distinct categories of evaluators. First, we evaluate ML models, including XGBoost [9], SVR [11, 27], Transformer [31] and TGCN [39], trained directly 5 Table 1: Prospective forecasting performance across an ","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Extreme wildfires are challenging to predict [41], as they emerge from the complex interplay of fire weather [9, 10, 37], topography [33], vegetation fuels [32, 37], and human factors such as ignition and fire suppression [16, 19, 26, 44], all of which are difficult to fully represent in process-based wildfire models. Whereas machine learning (ML) approaches such as XGBoost [8] have shown promise in wildfire prediction [5, 18, 21, 25, 42], outperforming process-based wildfire models [41], they t","claim_type":"method","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Xgboost: A scalable tree boosting system because it crossed a citation-hub threshold. Current citing contexts most often use it as baseline evidence (2 contexts).","role_counts":[{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"}]},"error":null,"updated_at":"2026-05-19T03:31:27.641811+00:00"}},"summary":{"title":"Xgboost: A scalable tree boosting system","claims":[{"claim_text":"a: Single-span bridges do not have a bent and are not included in the ML-based predictions. b: There is a small proportion of bridges that have more than 7 columns per bent. These bridges are considered as outliers and have been excluded before the model training. Figure 7: Proposed classifier chain for imputing missing attributes. An XGBoost classifier [40] is used as the predictive model to impute the four target attributes. The hyperparameters of each classifier are tuned using Bayesian optim","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Despite their efficiency, such adaptation can be unstable and prone to overfitting when supervision is scarce [48, 10, 23], highlighting the need for more data-efficient and robust fine-tuning strategies. Interestingly, the tabular learning community has long relied on a different paradigm to address similar challenges: gradient boosting. Systems such as XGBoost [ 5], LightGBM [20], and CatBoost [35] consistently achieve strong performance across a wide range of tabular tasks and are known to be","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"wi for each paper by log-normalizing and aggregating its GitHub stars, citation counts, influential citations, and Altmetric score. The distribution of the four log-normalized ground-truth impact metrics utilized in the dataset is shown in Figure 4. Baselines.We benchmark FAME against three distinct categories of evaluators. First, we evaluate ML models, including XGBoost [9], SVR [11, 27], Transformer [31] and TGCN [39], trained directly 5 Table 1: Prospective forecasting performance across an ","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Extreme wildfires are challenging to predict [41], as they emerge from the complex interplay of fire weather [9, 10, 37], topography [33], vegetation fuels [32, 37], and human factors such as ignition and fire suppression [16, 19, 26, 44], all of which are difficult to fully represent in process-based wildfire models. Whereas machine learning (ML) approaches such as XGBoost [8] have shown promise in wildfire prediction [5, 18, 21, 25, 42], outperforming process-based wildfire models [41], they t","claim_type":"method","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Xgboost: A scalable tree boosting system because it crossed a citation-hub threshold. Current citing contexts most often use it as baseline evidence (2 contexts).","role_counts":[{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"}]},"graph":{"co_cited":[{"title":"arXiv:2506.16791 [cs]","work_id":"155b5349-dee8-4870-965c-d54a700a19de","shared_citers":4},{"title":"arXiv preprint arXiv:2410.18164 , year=","work_id":"6dea29bf-5c1e-4430-8f09-25c430201a44","shared_citers":3},{"title":"Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018","work_id":"cb38f37f-2ff1-4119-9239-170112c586b9","shared_citers":3},{"title":"Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems, 30","work_id":"a9b1bdee-027d-4982-80a9-f2f6ac2c4858","shared_citers":3},{"title":"TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second","work_id":"2b2cf99d-d6e4-4a96-9e61-f783356a2581","shared_citers":3},{"title":"AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning","work_id":"5ca62627-18c8-4130-9e2d-a026a3d530d3","shared_citers":2},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":2},{"title":"Adapting tabpfn for zero-inflated metagenomic data","work_id":"be255465-64f6-4bec-abb6-1cd96199fd74","shared_citers":2},{"title":"Adjuster this! tabpfn for solar forecast error adjustment.https://gist.github","work_id":"b67c6309-3b49-4c67-a3a6-6a59e0c924f4","shared_citers":2},{"title":"Advanced deep learning enables prediction of allogeneic stem cell mobilization success","work_id":"d79f1364-81fd-4719-a64f-64be43e86ecc","shared_citers":2},{"title":"Advancing biogeographical ancestry predictions through machine learning","work_id":"6ac91f9f-80a6-4c79-9209-427dd40c32be","shared_citers":2},{"title":"Alzakari, Abdullah Aldrees, Muhammad Fahad Umer, Luca Cascone, Nader Innab, and Imran Ashraf","work_id":"3c89cf3e-5863-41f9-9b01-7a421c0e28d1","shared_citers":2},{"title":"A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy.Asian Spine Journal, 18(4):541–549, 2024","work_id":"861d621f-e198-4c91-94e2-ee05e330dfaa","shared_citers":2},{"title":"An rf-tabpfn-based framework for few-shot iot network attack recognition using lasso-rfe feature selection.IEEE Access, 13:151452–151465,","work_id":"8175d85b-f000-4633-aa25-65c12165f5e7","shared_citers":2},{"title":"Application of machine learning in caisson inclination prediction: Model performance comparison and interpretability analysis.Underground Space, 2025","work_id":"510d7df9-ba83-4771-9022-f7626c0a7941","shared_citers":2},{"title":"Application of tabpfn model on the energy performance improvement of high-power multistage centrifugal pump.Energy, 2025","work_id":"4f0bfbd2-2c08-48c8-bb98-63b192f88144","shared_citers":2},{"title":"Artificial intelligence for predicting post-excision recurrence and malignant progression in oral potentially malignant disorders: a retrospective cohort study","work_id":"8cc63eb8-22fe-4136-a5ee-459aa8d2681c","shared_citers":2},{"title":"arXiv preprint arXiv:2506.10914 , year=","work_id":"7236d2c7-404d-4193-a35f-ebf5175d2048","shared_citers":2},{"title":"A target-specific machine learning framework for predicting fuel blend properties","work_id":"319482c7-3b08-4c39-9727-a827230b5d80","shared_citers":2},{"title":"Attention is all you need.Advances in neural information processing systems, 30","work_id":"751efe07-5e91-415c-b3d1-f4734aa26960","shared_citers":2},{"title":"Attention is all you need.Advances in neural information processing systems, 30, 2017","work_id":"869861e6-4680-4369-bac3-12785d201be1","shared_citers":2},{"title":"Autoenergy: An automated feature engineering algorithm for energy consumption forecasting with automl.Knowledge-Based Systems, 2025","work_id":"79d71a76-edcf-4bd5-867b-6d678c2fdd5f","shared_citers":2},{"title":"AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data","work_id":"32ca4e6c-bd72-4586-8594-40eb6bcb6582","shared_citers":2},{"title":"Automated supervised identification of thunderstorm ground enhancements (tges).arXiv preprint arXiv:2510.25125, 2025","work_id":"3ee77b7f-a718-48e2-b004-03d1ff8e56d6","shared_citers":2}],"time_series":[{"n":2,"year":2025},{"n":8,"year":2026}],"dependency_candidates":[{"n":1,"role":"method","polarity":"use_method","paper_title":"Environment-Adaptive Preference Optimization for Wildfire Prediction","primary_cat":"cs.LG","context_text":"Extreme wildfires are challenging to predict [41], as they emerge from the complex interplay of fire weather [9, 10, 37], topography [33], vegetation fuels [32, 37], and human factors such as ignition and fire suppression [16, 19, 26, 44], all of which are difficult to fully represent in process-based wildfire models. Whereas machine learning (ML) approaches such as XGBoost [8] have shown promise in wildfire prediction [5, 18, 21, 25, 42], outperforming process-based wildfire models [41], they typically require extensive historical fire data for training and may struggle to generalize to novel fire regimes that emerge under climate change [ 14]. As climate change causes shifts in the spatial pattern, seasonality, and statistical distribution of fire","citing_arxiv_id":"2605.12435"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution","primary_cat":"cs.LG","context_text":"wi for each paper by log-normalizing and aggregating its GitHub stars, citation counts, influential citations, and Altmetric score. The distribution of the four log-normalized ground-truth impact metrics utilized in the dataset is shown in Figure 4. Baselines.We benchmark FAME against three distinct categories of evaluators. First, we evaluate ML models, including XGBoost [9], SVR [11, 27], Transformer [31] and TGCN [39], trained directly 5 Table 1: Prospective forecasting performance across an 18-month sliding window evaluation from June 2024 to November 2025. Performance is measured by the Spearman rank correlationρs between predicted and ground-truth composite impact weights. The experiments are carried out three times,","citing_arxiv_id":"2605.07208"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Quantifying Exposure Information Uncertainty in Regional Risk Assessment","primary_cat":"stat.AP","context_text":"a: Single-span bridges do not have a bent and are not included in the ML-based predictions. b: There is a small proportion of bridges that have more than 7 columns per bent. These bridges are considered as outliers and have been excluded before the model training. Figure 7: Proposed classifier chain for imputing missing attributes. An XGBoost classifier [40] is used as the predictive model to impute the four target attributes. The hyperparameters of each classifier are tuned using Bayesian optimization [41], which is conducted within a stratified five-fold cross- validation framework. Early stopping is employed during the training by monitoring validation performance within each fold, such that the boosting process is terminated when no further improvement is observed after a specified number","citing_arxiv_id":"2605.08272"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification","primary_cat":"cs.LG","context_text":"Despite their efficiency, such adaptation can be unstable and prone to overfitting when supervision is scarce [48, 10, 23], highlighting the need for more data-efficient and robust fine-tuning strategies. Interestingly, the tabular learning community has long relied on a different paradigm to address similar challenges: gradient boosting. Systems such as XGBoost [ 5], LightGBM [20], and CatBoost [35] consistently achieve strong performance across a wide range of tabular tasks and are known to be particularly robust in data-limited settings [12, 13, 49, 8, 36]. Boosting constructs models in a stage-wise manner, where learners are added sequentially to correct the residual errors of previous ones. This additive training strategy encourages models to focus on informative residual errors","citing_arxiv_id":"2605.06117"}]},"authors":[{"id":"5ee1a1f4-ade6-444d-a2ba-aa84c735be28","orcid":null,"display_name":"Carlos Guestrin","source":"manual","import_confidence":0.72},{"id":"a9e3ca51-d13e-40d6-9fff-7d3d53ad8b1c","orcid":null,"display_name":"Tianqi Chen","source":"manual","import_confidence":0.72}]}}