Ensembles at Any Cost? Accuracy-Energy Trade-offs in Recommender Systems

Jannik Nitschke; Joeran Beel; Lukas Wegmeth

arxiv: 2604.07869 · v1 · submitted 2026-04-09 · 💻 cs.IR · cs.LG

Ensembles at Any Cost? Accuracy-Energy Trade-offs in Recommender Systems

Jannik Nitschke , Lukas Wegmeth , Joeran Beel This is my paper

Pith reviewed 2026-05-10 18:29 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords recommender systemsensemble methodsenergy efficiencyaccuracy trade-offscarbon emissionsmachine learning sustainabilityMovieLens

0 comments

The pith

Ensemble methods improve recommender accuracy by 0.3-5.7% but raise energy use by 19-2549%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether ensemble methods justify their accuracy gains in recommender systems when whole-system energy consumption is taken into account. It runs 93 controlled experiments across two pipelines and four datasets, comparing four ensemble strategies to strong single-model baselines while measuring energy with a smart plug and converting results to CO2 equivalents. Across settings the ensembles deliver only modest accuracy lifts yet multiply energy draw and emissions, sometimes by more than twenty times. Selective strategies that combine only the best models prove far more efficient than exhaustive averaging of all predictors. A sympathetic reader would care because recommender systems run at massive scale, where even small per-query energy differences affect operating costs and environmental impact.

Core claim

Across explicit-rating and implicit-feedback tasks, ensemble methods raise accuracy by 0.3% to 5.7% while increasing energy consumption by 19% to 2,549%, with selective Top-Performers ensembles showing the smallest overheads and exhaustive averaging the largest; on the Anime dataset some ensembles multiply energy by a factor of twenty while LensKit ensembles hit memory limits on larger data.

What carries the argument

Direct comparison of Average, Weighted, Stacking/Rank Fusion, and Top Performers ensembles against optimized single models (SVD++, etc.) using RMSE and NDCG@10, with whole-system energy captured by EMERS smart-plug measurements and converted to CO2 equivalents.

If this is right

Selective ensembles such as Top Performers deliver accuracy gains at far lower energy overhead than exhaustive averaging.
On MovieLens 1M a Top Performers ensemble improves RMSE by 0.96% at only 18.8% extra energy.
On Anime an ensemble improves RMSE by 1.2% yet consumes 2,005% more energy and raises emissions from 2.6 to 53.8 mg CO2 equivalents.
Some LensKit ensemble configurations fail on larger datasets due to memory limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of production recommenders may need to treat energy as a first-class constraint rather than an afterthought.
Carbon accounting for recommender pipelines could be extended to include training as well as inference stages.
Hardware-aware or approximate ensemble techniques might narrow the energy gap without sacrificing accuracy.

Load-bearing premise

The smart-plug readings of whole-system energy accurately isolate the computational cost of the recommender pipelines without confounding effects from hardware variability or measurement noise.

What would settle it

A controlled run in which an ensemble achieves the reported accuracy gains while consuming no more energy than the strongest single model baseline would falsify the central trade-off claim.

read the original abstract

Ensemble methods are frequently used in recommender systems to improve accuracy by combining multiple models. Recent work reports sizable performance gains, but most studies still optimize primarily for accuracy and robustness rather than for energy efficiency. This paper measures accuracy energy trade offs of ensemble techniques relative to strong single models. We run 93 controlled experiments in two pipelines: 1. explicit rating prediction with Surprise (RMSE) and 2. implicit feedback ranking with LensKit (NDCG@10). We evaluate four datasets ranging from 100,000 to 7.8 million interactions (MovieLens 100K, MovieLens 1M, ModCloth, Anime). We compare four ensemble strategies (Average, Weighted, Stacking or Rank Fusion, Top Performers) against baselines and optimized single models. Whole system energy is measured with EMERS using a smart plug and converted to CO2 equivalents. Across settings, ensembles improve accuracy by 0.3% to 5.7% while increasing energy by 19% to 2,549%. On MovieLens 1M, a Top Performers ensemble improves RMSE by 0.96% at an 18.8% energy overhead over SVD++. On MovieLens 100K, an averaging ensemble improves NDCG@10 by 5.7% with 103% additional energy. On Anime, a Surprise Top Performers ensemble improves RMSE by 1.2% but consumes 2,005% more energy (0.21 vs. 0.01 Wh), increasing emissions from 2.6 to 53.8 mg CO2 equivalents, and LensKit ensembles fail due to memory limits. Overall, selective ensembles are more energy efficient than exhaustive averaging,

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ensembles give small accuracy lifts in recsys at high reported energy cost, but smart-plug readings on short runs likely mix in too much overhead to trust the exact numbers.

read the letter

The punchline is that this paper quantifies how ensembles in recommender systems buy small accuracy gains with large energy penalties, but the energy data comes from smart-plug measurements that look unreliable for the short runs involved. What the work does well is execute a broad set of 93 controlled experiments. It covers explicit rating prediction with the Surprise library and implicit ranking with LensKit, across four datasets from 100k to 7.8M interactions. The authors test four ensemble approaches against optimized single models and track whole-system energy draw using an EMERS smart plug, then convert the results to CO2 equivalents. This gives specific trade-off numbers, such as a 0.96% RMSE improvement on MovieLens 1M at 18.8% extra energy, or much larger energy hits on smaller data. It also points out that selective ensembles like top performers use less energy than full averaging. The main concern is whether those energy figures actually reflect the computational cost of the ensembles. The stress test note highlights that baselines as low as 0.01 Wh imply runtimes of just a few seconds. At that scale, fixed overheads from Python startup, data loading, and OS activity would dominate the readings, making it hard to attribute the reported increases—like 2005% on Anime—to the models themselves. There are no error bars, no repeated trials with variance estimates, and no cross-validation against CPU-level energy counters. This leaves the central trade-off numbers open to doubt, especially the extreme upper bounds. The paper is relevant for anyone working on energy-efficient machine learning applications, particularly in recommender systems where ensembles are popular. Readers interested in green computing will find the concrete measurements useful as a starting point, even if they need more rigorous validation. I would recommend sending this to peer review. The empirical scope is decent and the topic is important for sustainable AI, but the measurement methodology needs tightening before the results can be relied upon.

Referee Report

3 major / 2 minor

Summary. The manuscript reports results from 93 controlled experiments comparing four ensemble strategies (Average, Weighted, Stacking/Rank Fusion, Top Performers) against single models in two recommender pipelines: explicit rating prediction using Surprise (RMSE) and implicit feedback ranking using LensKit (NDCG@10). Experiments span four datasets (MovieLens 100K, MovieLens 1M, ModCloth, Anime) with whole-system energy measured via EMERS smart plug and converted to CO2 equivalents. The central claim is that ensembles deliver accuracy gains of 0.3%–5.7% at energy overheads of 19%–2,549%, with selective ensembles (e.g., Top Performers) more efficient than exhaustive averaging; specific examples include a 0.96% RMSE improvement at 18.8% energy cost on MovieLens 1M and a 5.7% NDCG@10 gain at 103% extra energy on MovieLens 100K.

Significance. If the energy measurements can be validated as isolating marginal pipeline costs, the work would be significant for quantifying sustainability trade-offs in recommender systems, an area where accuracy has historically dominated. The broad experimental scope across datasets, libraries, and ensemble types provides practical evidence that modest accuracy gains can incur large energy penalties, supporting calls for energy-aware design in IR. Direct measurement rather than proxies is a strength, though only if methodological controls are strengthened.

major comments (3)

[Energy measurement methodology] Energy measurement methodology (as described for the Anime dataset): the 0.01 Wh baseline rising to 0.21 Wh (2,005% increase) and similar large percentages (up to 2,549%) rest on whole-system smart-plug readings for short runs. No repeated trials, variance estimates, or cross-validation against CPU-level counters (RAPL, perf) are reported, so fixed overheads (interpreter startup, data loading, OS activity) likely dominate and prevent confident attribution of the reported overheads to ensemble computation itself. This directly undermines the accuracy-energy trade-off numbers that form the paper's central claim.
[Results section] Results section: accuracy improvements (0.3%–5.7%) and energy differences are presented without error bars, confidence intervals, or statistical tests (e.g., paired t-tests or Wilcoxon signed-rank). With 93 experiments but no assessment of variability or significance, it is impossible to determine whether reported gains exceed measurement noise, especially given the small absolute energy values involved.
[Experimental setup] Experimental setup: insufficient detail is given on hyperparameter tuning for baselines and ensembles, exact run durations, hardware configuration, and controls for background processes. This prevents verification that the 93 experiments are truly controlled and reproducible, which is load-bearing for claims of consistent trade-offs across pipelines and datasets.

minor comments (2)

[Abstract] The abstract claims selective ensembles are more energy efficient than exhaustive averaging, but the supporting quantitative comparisons are scattered; a consolidated table or figure summarizing overheads by ensemble type across all settings would improve clarity.
[Results] LensKit ensembles failing on Anime due to memory limits is noted but not explored further (e.g., via scaling experiments or alternative implementations); this limits the generalizability of the implicit-feedback results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed review, which highlights important methodological improvements needed to strengthen the paper's claims on accuracy-energy trade-offs. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: Energy measurement methodology (as described for the Anime dataset): the 0.01 Wh baseline rising to 0.21 Wh (2,005% increase) and similar large percentages (up to 2,549%) rest on whole-system smart-plug readings for short runs. No repeated trials, variance estimates, or cross-validation against CPU-level counters (RAPL, perf) are reported, so fixed overheads (interpreter startup, data loading, OS activity) likely dominate and prevent confident attribution of the reported overheads to ensemble computation itself. This directly undermines the accuracy-energy trade-off numbers that form the paper's central claim.

Authors: We agree that the energy measurements, based on single whole-system smart-plug readings for short runs, are vulnerable to fixed overheads and lack reported variance, which reduces confidence in attributing exact percentage increases solely to ensemble computation. In the revised manuscript, we will add repeated trials (minimum of 5 runs per configuration) and report means with standard deviations for all energy values. We will also expand the methodology discussion to explicitly address the limitations of whole-system measurement, including the influence of non-computational overheads, and qualify the trade-off claims accordingly. Cross-validation against RAPL or perf was not performed in the original experiments; we will note this as a limitation and suggest it for future work, while maintaining that relative efficiency differences between selective and exhaustive ensembles remain directionally informative. revision: partial
Referee: Results section: accuracy improvements (0.3%–5.7%) and energy differences are presented without error bars, confidence intervals, or statistical tests (e.g., paired t-tests or Wilcoxon signed-rank). With 93 experiments but no assessment of variability or significance, it is impossible to determine whether reported gains exceed measurement noise, especially given the small absolute energy values involved.

Authors: We acknowledge that the lack of error bars and statistical tests makes it difficult to assess whether the reported accuracy gains and energy differences exceed variability. In the revision, we will incorporate error bars based on the repeated measurements described above for both accuracy and energy metrics. We will also add statistical significance testing (e.g., paired t-tests for accuracy comparisons and appropriate tests for energy) to evaluate whether observed improvements are statistically meaningful beyond noise. revision: yes
Referee: Experimental setup: insufficient detail is given on hyperparameter tuning for baselines and ensembles, exact run durations, hardware configuration, and controls for background processes. This prevents verification that the 93 experiments are truly controlled and reproducible, which is load-bearing for claims of consistent trade-offs across pipelines and datasets.

Authors: We agree that additional detail is required for full reproducibility and verification of experimental controls. The revised manuscript will expand the experimental setup section to specify hyperparameter tuning procedures (including search ranges and selection criteria for baselines and ensembles), exact run durations, complete hardware specifications (CPU model, memory, OS), and controls for background processes (e.g., isolated execution environments and minimized system activity). We will also release the full experimental code, configurations, and data processing scripts to support independent verification. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements with no derivations or fitted predictions

full rationale

The paper consists entirely of direct empirical measurements from 93 controlled experiments across two recommender pipelines (Surprise and LensKit) and four datasets. It reports observed accuracy improvements (0.3%–5.7%) and energy increases (19%–2,549%) from ensemble strategies versus single models, using whole-system smart-plug readings converted to CO2 equivalents. No equations, derivations, first-principles results, fitted parameters, or predictions are claimed that could reduce to self-definitional inputs, fitted subsets, or self-citation chains. The central claims rest on raw experimental data rather than any mathematical reduction or ansatz. This is the expected outcome for a measurement-focused study; the skeptic's concerns address measurement validity (e.g., overhead dominance on short runs) but do not indicate circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical measurement study with no free parameters, axioms, or invented entities beyond standard experimental assumptions in machine learning evaluation.

pith-pipeline@v0.9.0 · 5623 in / 1064 out tokens · 54304 ms · 2026-05-10T18:29:47.557577+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Green AI,

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green AI,” Communications of the ACM, vol. 63, no. 12, pp. 54–63, 2020

work page 2020
[2]

Towards the systematic reporting of the energy and carbon footprints of machine learning,

P. Henderson, J. Hu, J. Romoff, E. Brunskill, D. Jurafsky, and J. Pineau, “Towards the systematic reporting of the energy and carbon footprints of machine learning,”Journal of Machine Learning Research, vol. 21, no. 248, pp. 1–43, 2020

work page 2020
[3]

Green recommender systems: A call for attention,

J. Beel, A. Said, T. Vente, and L. Wegmeth, “Green recommender systems: A call for attention,”SIGIR Forum, vol. 58, no. 2, pp. 1–5, 2024

work page 2024
[4]

Green recommender systems: Understanding and minimizing the carbon footprint of AI- powered personalization,

L. Wegmeth, T. Vente, A. Said, and J. Beel, “Green recommender systems: Understanding and minimizing the carbon footprint of AI- powered personalization,”ACM Transactions on Recommender Systems, 2025

work page 2025
[5]

Improving simple collaborative filtering models using ensemble methods,

A. Bar, L. Rokach, G. Shani, B. Shapira, and A. Schclar, “Improving simple collaborative filtering models using ensemble methods,” inMul- tiple Classifier Systems, ser. Lecture Notes in Computer Science, vol

work page
[6]

Springer, 2013, pp. 1–12

work page 2013
[7]

Methods for boosting recommender systems,

R. Boim and T. Milo, “Methods for boosting recommender systems,” in2011 IEEE 27th International Conference on Data Engineering Workshops, 2011, pp. 288–291

work page 2011
[8]

Presentation of a recommender system with ensemble learning and graph embedding: A case on movielens,

S. Forouzandeh, K. Berahmand, and M. Rostami, “Presentation of a recommender system with ensemble learning and graph embedding: A case on movielens,”Multimedia Tools and Applications, vol. 80, no. 5, pp. 7805–7832, 2021

work page 2021
[9]

Greedy ensemble selec- tion for top-n recommendations,

T. Vente, Z. Mehta, L. Wegmeth, and J. Beel, “Greedy ensemble selec- tion for top-n recommendations,” inProceedings of the RobustRecSys Workshop at the 18th ACM Conference on Recommender Systems, vol

work page
[10]

Bari, Italy: CEUR-WS.org, 2024

work page 2024
[11]

Assembled-openml: Creating efficient bench- marks for ensembles in AutoML with OpenML,

L. Purucker and J. Beel, “Assembled-openml: Creating efficient bench- marks for ensembles in AutoML with OpenML,” inAutoML Conference 2022 Workshop Track, 2022

work page 2022
[12]

CMA-ES for post hoc ensembling in AutoML: A great success and salvageable failure,

——, “CMA-ES for post hoc ensembling in AutoML: A great success and salvageable failure,” inAutoML Conference 2023, 2023

work page 2023
[13]

Q(D)O-ES: Population-based quality (diversity) optimisation for post hoc ensemble selection in AutoML,

L. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl, and H. Hoos, “Q(D)O-ES: Population-based quality (diversity) optimisation for post hoc ensemble selection in AutoML,” inAutoML Conference 2023, 2023

work page 2023
[14]

Towards sustainability-aware recommender systems: Analyzing the trade-off between algorithms performance and carbon footprint,

G. Spillo, A. D. Filippo, C. Musto, M. Milano, and G. Semeraro, “Towards sustainability-aware recommender systems: Analyzing the trade-off between algorithms performance and carbon footprint,” in Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 856–862

work page 2023
[15]

Towards green recommender systems: Investigating the impact of data reduction on carbon footprint and algorithm performances,

——, “Towards green recommender systems: Investigating the impact of data reduction on carbon footprint and algorithm performances,” in Proceedings of the 18th ACM Conference on Recommender Systems, 2024, pp. 866–871

work page 2024
[16]

14 kg of CO2: Analyzing the carbon footprint and performance of session-based recommendation algorithms,

A. Plaza, J. C. Gil, and D. Parra, “14 kg of CO2: Analyzing the carbon footprint and performance of session-based recommendation algorithms,” inRecommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 123–134

work page 2025
[17]

Eco-aware graph neural networks for sus- tainable recommendations,

A. Purificato and F. Silvestri, “Eco-aware graph neural networks for sus- tainable recommendations,” inRecommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 111–122

work page 2025
[18]

Recsys carbonator: Predicting carbon footprint of recommendation system models,

G. Spillo, A. G. Valerio, F. Franchini, A. D. Filippo, C. Musto, M. Milano, and G. Semeraro, “Recsys carbonator: Predicting carbon footprint of recommendation system models,” inRecommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 98– 110

work page 2025
[19]

From clicks to carbon: The environmental toll of recommender systems,

T. Vente, L. Wegmeth, A. Said, and J. Beel, “From clicks to carbon: The environmental toll of recommender systems,” inProceedings of the 18th ACM Conference on Recommender Systems, 2024, pp. 580–590

work page 2024
[20]

The feasibility of greedy ensemble selection for automated recommender systems,

T. Vente, L. Purucker, and J. Beel, “The feasibility of greedy ensemble selection for automated recommender systems,” COSEAL Workshop 2022, 2022

work page 2022
[21]

The potential of AutoML for recom- mender systems,

T. Vente, L. Wegmeth, and J. Beel, “The potential of AutoML for recom- mender systems,” inAdjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, 2025, pp. 371–378

work page 2025
[22]

Algorithm selection for recommender systems via meta-learning on algorithm characteristics,

J. M. Decker and J. Beel, “Algorithm selection for recommender systems via meta-learning on algorithm characteristics,”arXiv preprint arXiv:2508.04419, 2025

work page arXiv 2025
[23]

EMERS: Energy meter for recommender systems,

L. Wegmeth, T. Vente, A. Said, and J. Beel, “EMERS: Energy meter for recommender systems,” inInternational Workshop on Recommender Systems for Sustainability and Social Good (RecSoGood) at the 18th ACM Conference on Recommender Systems, 2024

work page 2024
[24]

Green recommender systems: Optimizing dataset size for energy-efficient algorithm performance,

A. Arabzadeh, T. Vente, and J. Beel, “Green recommender systems: Optimizing dataset size for energy-efficient algorithm performance,” in Recommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 73–82

work page 2025
[25]

e-fold cross-validation for recommender-system evaluation,

M. Baumgart, L. Wegmeth, T. Vente, and J. Beel, “e-fold cross-validation for recommender-system evaluation,” inRecommender Systems for Sus- tainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 90–97

work page 2025
[26]

From theory to practice: Implement- ing and evaluating e-fold cross-validation,

C. Mahlich, T. Vente, and J. Beel, “From theory to practice: Implement- ing and evaluating e-fold cross-validation,” inInternational Conference on Artificial Intelligence and Machine Learning Research, 2024

work page 2024
[27]

An experimental comparison of software-based power me- ters: Focus on CPU and GPU,

M. Jay, V . Ostapenco, L. Lefevre, D. Trystram, A.-C. Orgerie, and B. Fichel, “An experimental comparison of software-based power me- ters: Focus on CPU and GPU,” inProceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, 2023, pp. 106–118

work page 2023
[28]

Towards green automated machine learning: Status quo and future directions,

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever, and E. Huller- meier, “Towards green automated machine learning: Status quo and future directions,”Journal of Artificial Intelligence Research, vol. 77, pp. 427–457, 2023

work page 2023
[29]

The movielens datasets: History and context,

F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,”ACM Transactions on Interactive Intelligent Systems, vol. 5, no. 4, pp. 19:1–19:19, 2015

work page 2015
[30]

Surprise: A python library for recommender systems,

N. Hug, “Surprise: A python library for recommender systems,”Journal of Open Source Software, vol. 5, no. 52, p. 2174, 2020

work page 2020
[31]

Lenskit for python: Next-generation software for recommender systems experiments,

M. D. Ekstrand, “Lenskit for python: Next-generation software for recommender systems experiments,” inProceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2999–3006

work page 2020
[32]

Cumulated gain-based evaluation of IR techniques,

K. Jarvelin and J. Kekalainen, “Cumulated gain-based evaluation of IR techniques,”ACM Transactions on Information Systems, vol. 20, no. 4, pp. 422–446, 2002

work page 2002
[33]

The environmental impact of ensemble techniques in recommender systems,

J. Nitschke, “The environmental impact of ensemble techniques in recommender systems,” Bachelor’s thesis, University of Siegen, Siegen, Germany, 2025

work page 2025

[1] [1]

Green AI,

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green AI,” Communications of the ACM, vol. 63, no. 12, pp. 54–63, 2020

work page 2020

[2] [2]

Towards the systematic reporting of the energy and carbon footprints of machine learning,

P. Henderson, J. Hu, J. Romoff, E. Brunskill, D. Jurafsky, and J. Pineau, “Towards the systematic reporting of the energy and carbon footprints of machine learning,”Journal of Machine Learning Research, vol. 21, no. 248, pp. 1–43, 2020

work page 2020

[3] [3]

Green recommender systems: A call for attention,

J. Beel, A. Said, T. Vente, and L. Wegmeth, “Green recommender systems: A call for attention,”SIGIR Forum, vol. 58, no. 2, pp. 1–5, 2024

work page 2024

[4] [4]

Green recommender systems: Understanding and minimizing the carbon footprint of AI- powered personalization,

L. Wegmeth, T. Vente, A. Said, and J. Beel, “Green recommender systems: Understanding and minimizing the carbon footprint of AI- powered personalization,”ACM Transactions on Recommender Systems, 2025

work page 2025

[5] [5]

Improving simple collaborative filtering models using ensemble methods,

A. Bar, L. Rokach, G. Shani, B. Shapira, and A. Schclar, “Improving simple collaborative filtering models using ensemble methods,” inMul- tiple Classifier Systems, ser. Lecture Notes in Computer Science, vol

work page

[6] [6]

Springer, 2013, pp. 1–12

work page 2013

[7] [7]

Methods for boosting recommender systems,

R. Boim and T. Milo, “Methods for boosting recommender systems,” in2011 IEEE 27th International Conference on Data Engineering Workshops, 2011, pp. 288–291

work page 2011

[8] [8]

Presentation of a recommender system with ensemble learning and graph embedding: A case on movielens,

S. Forouzandeh, K. Berahmand, and M. Rostami, “Presentation of a recommender system with ensemble learning and graph embedding: A case on movielens,”Multimedia Tools and Applications, vol. 80, no. 5, pp. 7805–7832, 2021

work page 2021

[9] [9]

Greedy ensemble selec- tion for top-n recommendations,

T. Vente, Z. Mehta, L. Wegmeth, and J. Beel, “Greedy ensemble selec- tion for top-n recommendations,” inProceedings of the RobustRecSys Workshop at the 18th ACM Conference on Recommender Systems, vol

work page

[10] [10]

Bari, Italy: CEUR-WS.org, 2024

work page 2024

[11] [11]

Assembled-openml: Creating efficient bench- marks for ensembles in AutoML with OpenML,

L. Purucker and J. Beel, “Assembled-openml: Creating efficient bench- marks for ensembles in AutoML with OpenML,” inAutoML Conference 2022 Workshop Track, 2022

work page 2022

[12] [12]

CMA-ES for post hoc ensembling in AutoML: A great success and salvageable failure,

——, “CMA-ES for post hoc ensembling in AutoML: A great success and salvageable failure,” inAutoML Conference 2023, 2023

work page 2023

[13] [13]

Q(D)O-ES: Population-based quality (diversity) optimisation for post hoc ensemble selection in AutoML,

L. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl, and H. Hoos, “Q(D)O-ES: Population-based quality (diversity) optimisation for post hoc ensemble selection in AutoML,” inAutoML Conference 2023, 2023

work page 2023

[14] [14]

Towards sustainability-aware recommender systems: Analyzing the trade-off between algorithms performance and carbon footprint,

G. Spillo, A. D. Filippo, C. Musto, M. Milano, and G. Semeraro, “Towards sustainability-aware recommender systems: Analyzing the trade-off between algorithms performance and carbon footprint,” in Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 856–862

work page 2023

[15] [15]

Towards green recommender systems: Investigating the impact of data reduction on carbon footprint and algorithm performances,

——, “Towards green recommender systems: Investigating the impact of data reduction on carbon footprint and algorithm performances,” in Proceedings of the 18th ACM Conference on Recommender Systems, 2024, pp. 866–871

work page 2024

[16] [16]

14 kg of CO2: Analyzing the carbon footprint and performance of session-based recommendation algorithms,

A. Plaza, J. C. Gil, and D. Parra, “14 kg of CO2: Analyzing the carbon footprint and performance of session-based recommendation algorithms,” inRecommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 123–134

work page 2025

[17] [17]

Eco-aware graph neural networks for sus- tainable recommendations,

A. Purificato and F. Silvestri, “Eco-aware graph neural networks for sus- tainable recommendations,” inRecommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 111–122

work page 2025

[18] [18]

Recsys carbonator: Predicting carbon footprint of recommendation system models,

G. Spillo, A. G. Valerio, F. Franchini, A. D. Filippo, C. Musto, M. Milano, and G. Semeraro, “Recsys carbonator: Predicting carbon footprint of recommendation system models,” inRecommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 98– 110

work page 2025

[19] [19]

From clicks to carbon: The environmental toll of recommender systems,

T. Vente, L. Wegmeth, A. Said, and J. Beel, “From clicks to carbon: The environmental toll of recommender systems,” inProceedings of the 18th ACM Conference on Recommender Systems, 2024, pp. 580–590

work page 2024

[20] [20]

The feasibility of greedy ensemble selection for automated recommender systems,

T. Vente, L. Purucker, and J. Beel, “The feasibility of greedy ensemble selection for automated recommender systems,” COSEAL Workshop 2022, 2022

work page 2022

[21] [21]

The potential of AutoML for recom- mender systems,

T. Vente, L. Wegmeth, and J. Beel, “The potential of AutoML for recom- mender systems,” inAdjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization, 2025, pp. 371–378

work page 2025

[22] [22]

Algorithm selection for recommender systems via meta-learning on algorithm characteristics,

J. M. Decker and J. Beel, “Algorithm selection for recommender systems via meta-learning on algorithm characteristics,”arXiv preprint arXiv:2508.04419, 2025

work page arXiv 2025

[23] [23]

EMERS: Energy meter for recommender systems,

L. Wegmeth, T. Vente, A. Said, and J. Beel, “EMERS: Energy meter for recommender systems,” inInternational Workshop on Recommender Systems for Sustainability and Social Good (RecSoGood) at the 18th ACM Conference on Recommender Systems, 2024

work page 2024

[24] [24]

Green recommender systems: Optimizing dataset size for energy-efficient algorithm performance,

A. Arabzadeh, T. Vente, and J. Beel, “Green recommender systems: Optimizing dataset size for energy-efficient algorithm performance,” in Recommender Systems for Sustainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 73–82

work page 2025

[25] [25]

e-fold cross-validation for recommender-system evaluation,

M. Baumgart, L. Wegmeth, T. Vente, and J. Beel, “e-fold cross-validation for recommender-system evaluation,” inRecommender Systems for Sus- tainability and Social Good, L. Boratto, A. D. Filippo, E. Lex, and F. Ricci, Eds. Cham: Springer Nature Switzerland, 2025, pp. 90–97

work page 2025

[26] [26]

From theory to practice: Implement- ing and evaluating e-fold cross-validation,

C. Mahlich, T. Vente, and J. Beel, “From theory to practice: Implement- ing and evaluating e-fold cross-validation,” inInternational Conference on Artificial Intelligence and Machine Learning Research, 2024

work page 2024

[27] [27]

An experimental comparison of software-based power me- ters: Focus on CPU and GPU,

M. Jay, V . Ostapenco, L. Lefevre, D. Trystram, A.-C. Orgerie, and B. Fichel, “An experimental comparison of software-based power me- ters: Focus on CPU and GPU,” inProceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, 2023, pp. 106–118

work page 2023

[28] [28]

Towards green automated machine learning: Status quo and future directions,

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever, and E. Huller- meier, “Towards green automated machine learning: Status quo and future directions,”Journal of Artificial Intelligence Research, vol. 77, pp. 427–457, 2023

work page 2023

[29] [29]

The movielens datasets: History and context,

F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,”ACM Transactions on Interactive Intelligent Systems, vol. 5, no. 4, pp. 19:1–19:19, 2015

work page 2015

[30] [30]

Surprise: A python library for recommender systems,

N. Hug, “Surprise: A python library for recommender systems,”Journal of Open Source Software, vol. 5, no. 52, p. 2174, 2020

work page 2020

[31] [31]

Lenskit for python: Next-generation software for recommender systems experiments,

M. D. Ekstrand, “Lenskit for python: Next-generation software for recommender systems experiments,” inProceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2999–3006

work page 2020

[32] [32]

Cumulated gain-based evaluation of IR techniques,

K. Jarvelin and J. Kekalainen, “Cumulated gain-based evaluation of IR techniques,”ACM Transactions on Information Systems, vol. 20, no. 4, pp. 422–446, 2002

work page 2002

[33] [33]

The environmental impact of ensemble techniques in recommender systems,

J. Nitschke, “The environmental impact of ensemble techniques in recommender systems,” Bachelor’s thesis, University of Siegen, Siegen, Germany, 2025

work page 2025