Investment Ranking Challenge: Identifying the best performing stocks based on their semi-annual returns

Benjamin Harlander; Joe Byrum; Kirill Romanov; Lance Rane; Marcel Salathe; Mehmet Koseoglu; Pranoot Hatwar; Shanka Subhra Mondal; Sharada Prasanna Mohanty; Wei-Kai Liu

arxiv: 1906.08636 · v1 · pith:ZGLHQUKCnew · submitted 2019-06-20 · 💱 q-fin.ST · cs.LG

Investment Ranking Challenge: Identifying the best performing stocks based on their semi-annual returns

Shanka Subhra Mondal , Sharada Prasanna Mohanty , Benjamin Harlander , Mehmet Koseoglu , Lance Rane , Kirill Romanov , Wei-Kai Liu , Pranoot Hatwar

show 2 more authors

Marcel Salathe Joe Byrum

This is my paper

Pith reviewed 2026-05-25 19:02 UTC · model grok-4.3

classification 💱 q-fin.ST cs.LG

keywords stock rankinginvestment challengeneural networksboosting algorithmssupport vector machinesCNN LSTMfinancial predictionSpearman correlation

0 comments

The pith

The top six entries in the 2018 investment ranking challenge succeeded with mixtures of neural networks, boosting algorithms, support vector machines, and CNN-LSTM hybrids.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports the methods used by the winning teams in a competition to rank stocks by their forward six-month returns. The organizers supplied anonymized predictors and historical semi-annual returns split into 42 non-overlapping periods, with performance judged by Spearman's rank correlation and normalized discounted cumulative gain on the top 20 percent of predictions. The six invited solutions showed that selecting data subsets, combining deep and shallow networks, applying various boosting methods, using linear support vector machines, and pairing convolutional and recurrent layers all produced competitive rankings on the held-out test period.

Core claim

The top six solutions in the investment ranking challenge used varied approaches based on selecting subsets of data, combinations of deep and shallow neural networks, different boosting algorithms, linear support vector machines, and combinations of CNN and LSTM.

What carries the argument

An ensemble of neural networks, gradient boosting variants, linear SVMs, and CNN-LSTM stacks trained on selected subsets of the anonymized financial predictors to output stock rankings.

If this is right

Hybrid networks can combine local pattern detection from CNN layers with longer memory from LSTM layers for return forecasting.
Boosting methods remain competitive even when predictors are anonymized and the target is a six-month ranking.
Linear support vector machines can serve as a lightweight component inside larger ranking ensembles.
Training on carefully chosen data subsets improves out-of-sample ranking stability across the 42 semi-annual windows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the competition metrics align with live performance, then practitioners could test whether similar hybrid models improve portfolio construction when applied to non-anonymized fundamental and price data.
The success of multiple distinct architectures suggests that the underlying signal in semi-annual returns may be accessible through several different inductive biases rather than one privileged model family.

Load-bearing premise

The anonymized predictors together with Spearman's correlation and top-20 percent NDCG serve as adequate stand-ins for identifying models that would produce useful rankings under live market conditions.

What would settle it

A follow-up evaluation in which the submitted models are run on a fresh set of stocks with observable forward returns and produce rankings whose correlation with actual performance falls to near zero.

Figures

Figures reproduced from arXiv: 1906.08636 by Benjamin Harlander, Joe Byrum, Kirill Romanov, Lance Rane, Marcel Salathe, Mehmet Koseoglu, Pranoot Hatwar, Shanka Subhra Mondal, Sharada Prasanna Mohanty, Wei-Kai Liu.

**Figure 1.** Figure 1: Stock Return Prediction on Unseen Data The prediction of the lightgbm model for 2017 first quarter stock returns is shown in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 4.** Figure 4: Top performer features 2) Aggregation of basic features, usage of synthetic features and application of dimensionality reduction techniques (PCA) improve the predictive models. Application of technical analysis dont help when we cannot catch the dynamic of single securities, as we can see in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Windows size depending on prediction period [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Block diagram for the framework A. Method 1) Pre Processing: The input of the framework is a sequence of 70 attributes over a span of 6 months. So, before putting the data into the model we impute the NA values with zeros and then reshape the attributes into 1 x 6 x 70. 2) Convolutional Layers: The role of convolution layers in the framework is to extract higher dimensional features for every time step wh… view at source ↗

**Figure 9.** Figure 9: Distribution of scores for top six participants in order for Round 1 and Round 2 [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 11.** Figure 11: Distribution of final scores for top six partcipants over Round 1 and [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗

read the original abstract

In the IEEE Investment ranking challenge 2018, participants were asked to build a model which would identify the best performing stocks based on their returns over a forward six months window. Anonymized financial predictors and semi-annual returns were provided for a group of anonymized stocks from 1996 to 2017, which were divided into 42 non-overlapping six months period. The second half of 2017 was used as an out-of-sample test of the model's performance. Metrics used were Spearman's Rank Correlation Coefficient and Normalized Discounted Cumulative Gain (NDCG) of the top 20% of a model's predicted rankings. The top six participants were invited to describe their approach. The solutions used were varied and were based on selecting a subset of data to train, combination of deep and shallow neural networks, different boosting algorithms, different models with different sets of features, linear support vector machine, combination of convoltional neural network (CNN) and Long short term memory (LSTM).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short descriptive summary of the 2018 challenge winners' methods with no new results or analysis.

read the letter

This paper reports on the IEEE Investment Ranking Challenge from 2018. It sets up the task of ranking stocks by forward six-month returns using anonymized predictors, splits the data into 42 periods from 1996-2017, and holds out the second half of 2017 for testing. The top six teams are invited to describe their entries, which relied on data subsetting, mixes of deep and shallow neural nets, boosting variants, linear SVMs, and CNN-LSTM combinations. Metrics were Spearman's rank correlation and top-20% NDCG. That is the full content. Nothing in the paper is new. The listed techniques were already known to challenge participants, and the text adds no derivations, ablations, or fresh experiments. It does a clear job of recording the challenge rules and naming the broad categories of methods that placed highest. The soft spots are straightforward. No performance numbers appear, so there is no way to check whether the listed approaches actually beat baselines on the hidden test set or by how much. The paper also offers no discussion of whether the chosen metrics or anonymized features would produce rankings that matter for actual portfolio decisions. Because the work is purely descriptive and contains no scientific claim or reproducible result, it has limited value outside the immediate circle of challenge organizers and entrants. I would not bring it to a reading group or cite it. It does not rise to the level that would justify sending it out for peer review.

Referee Report

0 major / 2 minor

Summary. The manuscript reports on the IEEE Investment Ranking Challenge 2018, in which participants built models to identify top-performing stocks by their semi-annual returns. It describes the dataset of anonymized financial predictors and returns spanning 1996–2017 across 42 non-overlapping six-month periods, with the second half of 2017 held out as an out-of-sample test. Evaluation used Spearman's rank correlation and top-20% NDCG. The paper summarizes the heterogeneous approaches taken by the top six participants: data subset selection, combinations of deep and shallow neural networks, boosting algorithms, linear support vector machines, and CNN-LSTM hybrids.

Significance. As a competition report, the manuscript supplies an archival record of the methods that ranked highest under the stated metrics. Its primary value is documenting the range of standard ML techniques that proved effective for this ranking task; the absence of numerical scores or ablation details limits its utility for methodological comparison.

minor comments (2)

[Abstract] Abstract: 'convoltional' is a typographical error and should read 'convolutional'.
The report would be strengthened by including the actual test-set scores (Spearman and NDCG) attained by each of the top six entries, even if only in summary form.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and recommendation of minor revision. The manuscript serves as an archival record of the top-performing approaches from the IEEE Investment Ranking Challenge 2018, and we appreciate the recognition of its value in documenting the range of effective ML techniques for this task.

Circularity Check

0 steps flagged

No significant circularity; purely descriptive competition report

full rationale

The manuscript is a post-competition summary that reports participant approaches and metrics without advancing any derivation, model, or prediction of its own. No equations, fitted parameters, or load-bearing self-citations appear. The sole claim—that top entries used heterogeneous standard techniques—is observational and externally verifiable from the competition results themselves. No step reduces to a self-definition or fitted input renamed as a prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, free parameters, axioms, or invented entities appear in the abstract; the document is a competition summary without any derivation or theoretical claim.

pith-pipeline@v0.9.0 · 5742 in / 1103 out tokens · 32868 ms · 2026-05-25T19:02:01.024615+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

sklearn.linear model.RidgeCV scikit-learn 0.19.2 documen- tation

3.2.4.1.9. sklearn.linear model.RidgeCV scikit-learn 0.19.2 documen- tation

work page
[2]

https://www.kaggle.com/c/two-sigma-ﬁnancial-modeling

work page
[3]

sklearn.linear model.BayesianRidge scikit-learn 0.19.2 documentation

work page
[4]

sklearn.linear model.HuberRegressor scikit-learn 0.19.2 documenta- tion

work page
[5]

sklearn.linear model.LinearRegression scikit-learn 0.19.2 documenta- tion

work page
[6]

sklearn.linear model.Ridge scikit-learn 0.19.2 documentation

work page
[7]

sklearn.svm.LinearSVR scikit-learn 0.19.2 documentation

work page
[8]

Kernel factory: An ensemble of kernel machines

Michel Ballings and Dirk Van den Poel. Kernel factory: An ensemble of kernel machines. Expert Systems with Applications , 40(8):2904–2913, 2013

work page 2013
[9]

Evaluating multiple classiﬁers for stock price direction prediction

Michel Ballings, Dirk Van den Poel, Nathalie Hespeels, and Ruben Gryp. Evaluating multiple classiﬁers for stock price direction prediction. Expert Systems with Applications , 42(20):7046–7056, 2015

work page 2015
[10]

Support vector regression

Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. Support vector regression. Neural Information Processing-Letters and Reviews , 11(10):203–224, 2007

work page 2007
[11]

Random forests

Leo Breiman. Random forests. Machine learning , 45(1):5–32, 2001

work page 2001
[12]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages 785–794. ACM, 2016

work page 2016
[13]

Support-vector networks

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995

work page 1995
[14]

Catboost: gradient boosting with categorical features support

Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. Catboost: gradient boosting with categorical features support

work page
[15]

Deep learning with long short-term memory networks for ﬁnancial market predictions

Thomas Fischer and Christopher Krauss. Deep learning with long short-term memory networks for ﬁnancial market predictions. European Journal of Operational Research , 270(2):654–669, 2018

work page 2018
[16]

A decision-theoretic generalization of on-line learning and an application to boosting

Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences , 55(1):119–139, 1997

work page 1997
[17]

Long short-term memory

Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory. Neural computation , 9(8):1735–1780, 1997

work page 1997
[18]

Lightgbm: A highly efﬁcient gradient boosting decision tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efﬁcient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146–3154, 2017

work page 2017
[19]

Particle swarm optimization

James Kennedy. Particle swarm optimization. Encyclopedia of machine learning, pages 760–766, 2010

work page 2010
[20]

Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the s&p 500

Christopher Krauss, Xuan Anh Do, and Nicolas Huck. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the s&p 500. European Journal of Operational Research , 259(2):689– 702, 2017

work page 2017
[21]

Predicting stock market index using fusion of machine learning techniques

Jigar Patel, Sahil Shah, Priyank Thakkar, and Ketan Kotecha. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications , 42(4):2162–2172, 2015

work page 2015
[22]

Recurrent neural network and a hybrid model for prediction of stock returns.Expert Systems with Applications , 42(6):3234–3241, 2015

Akhter Mohiuddin Rather, Arun Agarwal, and VN Sastry. Recurrent neural network and a hybrid model for prediction of stock returns.Expert Systems with Applications , 42(6):3234–3241, 2015

work page 2015
[23]

Ensemble anns- pso-ga approach for day-ahead stock e-exchange prices forecasting

Yi Xiao, Jin Xiao, Fengbin Lu, and Shouyang Wang. Ensemble anns- pso-ga approach for day-ahead stock e-exchange prices forecasting. International Journal of Computational Intelligence Systems , 6(1):96– 114, 2013

work page 2013
[24]

Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy

Kamil ˙Zbikowski. Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy. Expert Systems with Applications , 42(4):1797– 1805, 2015

work page 2015

[1] [1]

sklearn.linear model.RidgeCV scikit-learn 0.19.2 documen- tation

3.2.4.1.9. sklearn.linear model.RidgeCV scikit-learn 0.19.2 documen- tation

work page

[2] [2]

https://www.kaggle.com/c/two-sigma-ﬁnancial-modeling

work page

[3] [3]

sklearn.linear model.BayesianRidge scikit-learn 0.19.2 documentation

work page

[4] [4]

sklearn.linear model.HuberRegressor scikit-learn 0.19.2 documenta- tion

work page

[5] [5]

sklearn.linear model.LinearRegression scikit-learn 0.19.2 documenta- tion

work page

[6] [6]

sklearn.linear model.Ridge scikit-learn 0.19.2 documentation

work page

[7] [7]

sklearn.svm.LinearSVR scikit-learn 0.19.2 documentation

work page

[8] [8]

Kernel factory: An ensemble of kernel machines

Michel Ballings and Dirk Van den Poel. Kernel factory: An ensemble of kernel machines. Expert Systems with Applications , 40(8):2904–2913, 2013

work page 2013

[9] [9]

Evaluating multiple classiﬁers for stock price direction prediction

Michel Ballings, Dirk Van den Poel, Nathalie Hespeels, and Ruben Gryp. Evaluating multiple classiﬁers for stock price direction prediction. Expert Systems with Applications , 42(20):7046–7056, 2015

work page 2015

[10] [10]

Support vector regression

Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. Support vector regression. Neural Information Processing-Letters and Reviews , 11(10):203–224, 2007

work page 2007

[11] [11]

Random forests

Leo Breiman. Random forests. Machine learning , 45(1):5–32, 2001

work page 2001

[12] [12]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages 785–794. ACM, 2016

work page 2016

[13] [13]

Support-vector networks

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995

work page 1995

[14] [14]

Catboost: gradient boosting with categorical features support

Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. Catboost: gradient boosting with categorical features support

work page

[15] [15]

Deep learning with long short-term memory networks for ﬁnancial market predictions

Thomas Fischer and Christopher Krauss. Deep learning with long short-term memory networks for ﬁnancial market predictions. European Journal of Operational Research , 270(2):654–669, 2018

work page 2018

[16] [16]

A decision-theoretic generalization of on-line learning and an application to boosting

Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences , 55(1):119–139, 1997

work page 1997

[17] [17]

Long short-term memory

Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory. Neural computation , 9(8):1735–1780, 1997

work page 1997

[18] [18]

Lightgbm: A highly efﬁcient gradient boosting decision tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efﬁcient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146–3154, 2017

work page 2017

[19] [19]

Particle swarm optimization

James Kennedy. Particle swarm optimization. Encyclopedia of machine learning, pages 760–766, 2010

work page 2010

[20] [20]

Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the s&p 500

Christopher Krauss, Xuan Anh Do, and Nicolas Huck. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the s&p 500. European Journal of Operational Research , 259(2):689– 702, 2017

work page 2017

[21] [21]

Predicting stock market index using fusion of machine learning techniques

Jigar Patel, Sahil Shah, Priyank Thakkar, and Ketan Kotecha. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications , 42(4):2162–2172, 2015

work page 2015

[22] [22]

Recurrent neural network and a hybrid model for prediction of stock returns.Expert Systems with Applications , 42(6):3234–3241, 2015

Akhter Mohiuddin Rather, Arun Agarwal, and VN Sastry. Recurrent neural network and a hybrid model for prediction of stock returns.Expert Systems with Applications , 42(6):3234–3241, 2015

work page 2015

[23] [23]

Ensemble anns- pso-ga approach for day-ahead stock e-exchange prices forecasting

Yi Xiao, Jin Xiao, Fengbin Lu, and Shouyang Wang. Ensemble anns- pso-ga approach for day-ahead stock e-exchange prices forecasting. International Journal of Computational Intelligence Systems , 6(1):96– 114, 2013

work page 2013

[24] [24]

Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy

Kamil ˙Zbikowski. Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy. Expert Systems with Applications , 42(4):1797– 1805, 2015

work page 2015