Investment Ranking Challenge: Identifying the best performing stocks based on their semi-annual returns
Pith reviewed 2026-05-25 19:02 UTC · model grok-4.3
The pith
The top six entries in the 2018 investment ranking challenge succeeded with mixtures of neural networks, boosting algorithms, support vector machines, and CNN-LSTM hybrids.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The top six solutions in the investment ranking challenge used varied approaches based on selecting subsets of data, combinations of deep and shallow neural networks, different boosting algorithms, linear support vector machines, and combinations of CNN and LSTM.
What carries the argument
An ensemble of neural networks, gradient boosting variants, linear SVMs, and CNN-LSTM stacks trained on selected subsets of the anonymized financial predictors to output stock rankings.
If this is right
- Hybrid networks can combine local pattern detection from CNN layers with longer memory from LSTM layers for return forecasting.
- Boosting methods remain competitive even when predictors are anonymized and the target is a six-month ranking.
- Linear support vector machines can serve as a lightweight component inside larger ranking ensembles.
- Training on carefully chosen data subsets improves out-of-sample ranking stability across the 42 semi-annual windows.
Where Pith is reading between the lines
- If the competition metrics align with live performance, then practitioners could test whether similar hybrid models improve portfolio construction when applied to non-anonymized fundamental and price data.
- The success of multiple distinct architectures suggests that the underlying signal in semi-annual returns may be accessible through several different inductive biases rather than one privileged model family.
Load-bearing premise
The anonymized predictors together with Spearman's correlation and top-20 percent NDCG serve as adequate stand-ins for identifying models that would produce useful rankings under live market conditions.
What would settle it
A follow-up evaluation in which the submitted models are run on a fresh set of stocks with observable forward returns and produce rankings whose correlation with actual performance falls to near zero.
Figures
read the original abstract
In the IEEE Investment ranking challenge 2018, participants were asked to build a model which would identify the best performing stocks based on their returns over a forward six months window. Anonymized financial predictors and semi-annual returns were provided for a group of anonymized stocks from 1996 to 2017, which were divided into 42 non-overlapping six months period. The second half of 2017 was used as an out-of-sample test of the model's performance. Metrics used were Spearman's Rank Correlation Coefficient and Normalized Discounted Cumulative Gain (NDCG) of the top 20% of a model's predicted rankings. The top six participants were invited to describe their approach. The solutions used were varied and were based on selecting a subset of data to train, combination of deep and shallow neural networks, different boosting algorithms, different models with different sets of features, linear support vector machine, combination of convoltional neural network (CNN) and Long short term memory (LSTM).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports on the IEEE Investment Ranking Challenge 2018, in which participants built models to identify top-performing stocks by their semi-annual returns. It describes the dataset of anonymized financial predictors and returns spanning 1996–2017 across 42 non-overlapping six-month periods, with the second half of 2017 held out as an out-of-sample test. Evaluation used Spearman's rank correlation and top-20% NDCG. The paper summarizes the heterogeneous approaches taken by the top six participants: data subset selection, combinations of deep and shallow neural networks, boosting algorithms, linear support vector machines, and CNN-LSTM hybrids.
Significance. As a competition report, the manuscript supplies an archival record of the methods that ranked highest under the stated metrics. Its primary value is documenting the range of standard ML techniques that proved effective for this ranking task; the absence of numerical scores or ablation details limits its utility for methodological comparison.
minor comments (2)
- [Abstract] Abstract: 'convoltional' is a typographical error and should read 'convolutional'.
- The report would be strengthened by including the actual test-set scores (Spearman and NDCG) attained by each of the top six entries, even if only in summary form.
Simulated Author's Rebuttal
We thank the referee for their review and recommendation of minor revision. The manuscript serves as an archival record of the top-performing approaches from the IEEE Investment Ranking Challenge 2018, and we appreciate the recognition of its value in documenting the range of effective ML techniques for this task.
Circularity Check
No significant circularity; purely descriptive competition report
full rationale
The manuscript is a post-competition summary that reports participant approaches and metrics without advancing any derivation, model, or prediction of its own. No equations, fitted parameters, or load-bearing self-citations appear. The sole claim—that top entries used heterogeneous standard techniques—is observational and externally verifiable from the competition results themselves. No step reduces to a self-definition or fitted input renamed as a prediction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
sklearn.linear model.RidgeCV scikit-learn 0.19.2 documen- tation
3.2.4.1.9. sklearn.linear model.RidgeCV scikit-learn 0.19.2 documen- tation
-
[2]
https://www.kaggle.com/c/two-sigma-financial-modeling
-
[3]
sklearn.linear model.BayesianRidge scikit-learn 0.19.2 documentation
-
[4]
sklearn.linear model.HuberRegressor scikit-learn 0.19.2 documenta- tion
-
[5]
sklearn.linear model.LinearRegression scikit-learn 0.19.2 documenta- tion
-
[6]
sklearn.linear model.Ridge scikit-learn 0.19.2 documentation
-
[7]
sklearn.svm.LinearSVR scikit-learn 0.19.2 documentation
-
[8]
Kernel factory: An ensemble of kernel machines
Michel Ballings and Dirk Van den Poel. Kernel factory: An ensemble of kernel machines. Expert Systems with Applications , 40(8):2904–2913, 2013
work page 2013
-
[9]
Evaluating multiple classifiers for stock price direction prediction
Michel Ballings, Dirk Van den Poel, Nathalie Hespeels, and Ruben Gryp. Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications , 42(20):7046–7056, 2015
work page 2015
-
[10]
Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. Support vector regression. Neural Information Processing-Letters and Reviews , 11(10):203–224, 2007
work page 2007
- [11]
-
[12]
Xgboost: A scalable tree boosting system
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages 785–794. ACM, 2016
work page 2016
-
[13]
Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995
work page 1995
-
[14]
Catboost: gradient boosting with categorical features support
Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. Catboost: gradient boosting with categorical features support
-
[15]
Deep learning with long short-term memory networks for financial market predictions
Thomas Fischer and Christopher Krauss. Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research , 270(2):654–669, 2018
work page 2018
-
[16]
A decision-theoretic generalization of on-line learning and an application to boosting
Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences , 55(1):119–139, 1997
work page 1997
-
[17]
Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory. Neural computation , 9(8):1735–1780, 1997
work page 1997
-
[18]
Lightgbm: A highly efficient gradient boosting decision tree
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146–3154, 2017
work page 2017
-
[19]
James Kennedy. Particle swarm optimization. Encyclopedia of machine learning, pages 760–766, 2010
work page 2010
-
[20]
Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the s&p 500
Christopher Krauss, Xuan Anh Do, and Nicolas Huck. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the s&p 500. European Journal of Operational Research , 259(2):689– 702, 2017
work page 2017
-
[21]
Predicting stock market index using fusion of machine learning techniques
Jigar Patel, Sahil Shah, Priyank Thakkar, and Ketan Kotecha. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications , 42(4):2162–2172, 2015
work page 2015
-
[22]
Akhter Mohiuddin Rather, Arun Agarwal, and VN Sastry. Recurrent neural network and a hybrid model for prediction of stock returns.Expert Systems with Applications , 42(6):3234–3241, 2015
work page 2015
-
[23]
Ensemble anns- pso-ga approach for day-ahead stock e-exchange prices forecasting
Yi Xiao, Jin Xiao, Fengbin Lu, and Shouyang Wang. Ensemble anns- pso-ga approach for day-ahead stock e-exchange prices forecasting. International Journal of Computational Intelligence Systems , 6(1):96– 114, 2013
work page 2013
-
[24]
Kamil ˙Zbikowski. Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy. Expert Systems with Applications , 42(4):1797– 1805, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.