An Enhanced Ad Event-Prediction Method Based on Feature Engineering

Saeid Soheily Khah; Yiming Wu

arxiv: 1907.01959 · v1 · pith:YAWU76IPnew · submitted 2019-07-03 · 💻 cs.LG · stat.ML

An Enhanced Ad Event-Prediction Method Based on Feature Engineering

Saeid Soheily Khah , Yiming Wu This is my paper

Pith reviewed 2026-05-25 10:00 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords ad event predictionfeature engineeringclick-through rateconversion ratedigital advertisingmachine learningreal-time bidding

0 comments

The pith

A new feature engineering approach for ad event prediction significantly outperforms existing methods on a large real-world marketing dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new efficient feature engineering method to enhance prediction of ad events such as clicks and conversions. This is evaluated using a large real-world event-based dataset from a running marketing campaign. The results show that the proposed approach outperforms alternative methods. A sympathetic reader would care because CTR and CVR are key metrics in digital advertising, affecting sponsored search, display ads, and real-time bidding systems. Accurate prediction can lead to more effective ad placements and better campaign performance.

Core claim

In this work, we introduce an enhanced method for ad event prediction (i.e. clicks, conversions) by proposing a new efficient feature engineering approach. A large real-world event-based dataset of a running marketing campaign is used to evaluate the efficiency of the proposed prediction algorithm. The results illustrate the benefits of the proposed ad event prediction approach, which significantly outperforms the alternative ones.

What carries the argument

The new efficient feature engineering approach that improves ad event prediction accuracy.

If this is right

More accurate prediction of clicks and conversions in digital ad campaigns.
Better optimization of real-time bidding strategies.
Improved evaluation of ad performance using CTR and CVR metrics.
Enhanced systems for sponsored search and display advertising.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The feature engineering could be adapted to other prediction tasks beyond advertising.
Focus on feature engineering might allow simpler models to achieve high performance.
Further testing on multiple campaigns would strengthen claims of generalizability.

Load-bearing premise

That the performance improvements come from the new feature engineering method itself rather than other unmentioned factors like the choice of model or tuning, and that results hold for other datasets.

What would settle it

Running the same prediction models with and without the proposed feature engineering on the dataset and finding no significant difference in performance.

Figures

Figures reproduced from arXiv: 1907.01959 by Saeid Soheily Khah, Yiming Wu.

read the original abstract

In digital advertising, Click-Through Rate (CTR) and Conversion Rate (CVR) are very important metrics for evaluating ad performance. As a result, ad event prediction systems are vital and widely used for sponsored search and display advertising as well as Real-Time Bidding (RTB). In this work, we introduce an enhanced method for ad event prediction (i.e. clicks, conversions) by proposing a new efficient feature engineering approach. A large real-world event-based dataset of a running marketing campaign is used to evaluate the efficiency of the proposed prediction algorithm. The results illustrate the benefits of the proposed ad event prediction approach, which significantly outperforms the alternative ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard feature engineering for ad CTR/CVR on one campaign dataset with claimed gains but no visible ablations to credit the new method.

read the letter

The paper describes a feature engineering approach for predicting ad events like clicks and conversions. It tests the method on a large proprietary dataset from an active marketing campaign and states that it beats the alternatives. The practical setting is the main positive: real campaign data rather than toy benchmarks, and the application area (sponsored search, RTB) is one where small lifts matter for revenue. That gives the work a clear audience in ad tech teams who need implementation details they can try on their own logs. Beyond that, the contribution looks incremental. Feature engineering for CTR and CVR is already a crowded space with well-known tricks around crossings, embeddings, and time-based features; nothing in the abstract or title signals a first-principles change. The central claim of outperformance therefore rests on whether the experiments actually isolate the new engineering steps. The stress-test note is on point here: without ablations, clear baseline descriptions, or multiple datasets, it is hard to rule out that gains came from model choice, tuning, or dataset quirks rather than the proposed features. A single campaign also limits any claim to broader applicability. This is the kind of paper that might interest practitioners looking for code-level ideas they can adapt, but it does not move the general literature. I would bring it to a reading group only if someone in the group works directly on advertising systems. It is not something I would cite in my own work. For peer review, an editor could reasonably send it out; the topic has commercial weight and the data is real, even if the experiments need tightening. Expect referees to ask for the missing controls and perhaps a public benchmark comparison.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce a new efficient feature engineering approach for ad event prediction (clicks and conversions) that significantly outperforms alternative methods when evaluated on a large real-world event-based dataset from a running marketing campaign.

Significance. If the performance gains can be rigorously attributed to the proposed feature engineering through proper ablations and comparisons, the work could have practical significance for improving sponsored search and RTB systems in digital advertising. However, the absence of detailed method descriptions and quantitative results in the abstract raises concerns about verifiability.

major comments (2)

[Abstract] Abstract: The central claim that the proposed approach 'significantly outperforms the alternative ones' is stated without any quantitative results, error bars, baseline details, or description of the feature engineering method, making verification of the claim impossible.
[Abstract] Abstract: No information is given on the base learner, the specific transformations introduced by the new feature engineering, the alternative methods being compared, or any ablation studies, so performance gains cannot be attributed to the proposed method rather than model choice or dataset artifacts.

minor comments (1)

[Abstract] The abstract refers to 'the results' without referencing any tables, figures, or specific metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments regarding the abstract. We agree that additional details are needed to support the central claims and allow verification. We will revise the abstract in the next version to include quantitative results, method descriptions, baselines, and ablation references while preserving the manuscript's focus on the feature engineering approach.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the proposed approach 'significantly outperforms the alternative ones' is stated without any quantitative results, error bars, baseline details, or description of the feature engineering method, making verification of the claim impossible.

Authors: We agree that the abstract as written does not provide these specifics. The revised abstract will report key quantitative metrics (such as AUC or log-loss improvements with standard deviations), name the base learner, outline the main feature transformations, list the alternative methods, and reference the ablation results that attribute gains to the proposed engineering. revision: yes
Referee: [Abstract] Abstract: No information is given on the base learner, the specific transformations introduced by the new feature engineering, the alternative methods being compared, or any ablation studies, so performance gains cannot be attributed to the proposed method rather than model choice or dataset artifacts.

Authors: The full manuscript details the base learner, the exact feature transformations, the compared methods, and the ablation studies that isolate the contribution of the new feature engineering. To address the abstract-level concern, the revision will briefly summarize these elements and note that ablations confirm the performance differences arise from the proposed transformations rather than model or data artifacts. revision: yes

Circularity Check

0 steps flagged

No derivation chain or self-referential steps present

full rationale

The paper is an empirical ML study that proposes a feature engineering method for CTR/CVR prediction and reports outperformance on one real-world campaign dataset. The provided abstract and text contain no equations, no parameter-fitting steps presented as predictions, and no self-citations invoked as uniqueness theorems or load-bearing premises. The central claim rests on experimental comparison rather than any mathematical reduction to its own inputs, satisfying the criteria for a self-contained empirical result with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no technical details on free parameters, axioms, or invented entities are provided.

pith-pipeline@v0.9.0 · 5635 in / 1095 out tokens · 39044 ms · 2026-05-25T10:00:27.196056+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

[1]

An Enhanced Ad Event-Prediction Method Based on Feature Engineering

Introduction Ad event prediction is critical to many web applications including recommender systems, web search, sponsored search, and display advertising [1, 2, 3, 4, 5], and is a hot research direction in computational advertising [6, 7]. The event prediction is deﬁned to estimate the ratio of events such as videos, clicks or conversions to impressions ...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[2]

State-of-the-art In the literature, a variety of classiﬁcation techniques such as logistic regression, support vector machine, (deep) neural network, nearest neighbor, naive Bayes, decision tree and random forest have been widely used as machine learning and data mining techniques for ad event prediction applications. Logistic regression contains many tec...

work page
[3]

In any artiﬁcial intelligence or machine learning algorithm (e.g

Feature engineering Feature engineering is the fundamental to the application of machine learning, data analysis and mining as well as mostly all artiﬁcial intelligence tasks, and generally, is diﬃcult, costly and expensive. In any artiﬁcial intelligence or machine learning algorithm (e.g. predictive and classiﬁcation models), the features in the data are...

work page
[4]

Typically, there are plenty of recorded information, attributes and measures in an executed marketing campaign

The design choices The proposed feature engineering strategy is brieﬂy presented in to the following steps (see Algorithm 1), where in the reminder of this section, we explain in detail the proposed feature learning approach for the ad event prediction. Typically, there are plenty of recorded information, attributes and measures in an executed marketing c...

work page
[5]

Experimental study In this section, we ﬁrst describe the dataset used to conduct our experiments, then specify the validation process, prior to present and discuss the results that we obtained. 5.1. Data description In this section, to clarify our claim in ad event prediction, we used a large real-world dataset of a running marketing campaign. The dataset...

work page
[6]

In this framework, we propose two statistical approaches which can be used for feature selection: i) the adjusted Chi-squared test and ii) the adjusted mutual information

Conclusion This research work introduces an enhanced ad event prediction framework which has been applied on big data. In this framework, we propose two statistical approaches which can be used for feature selection: i) the adjusted Chi-squared test and ii) the adjusted mutual information. Then, by ranking the statistical measures we select the best featu...

work page
[7]

Personalized click prediction in sponsored search,

H. Cheng and E. Cant´ u-Paz, “Personalized click prediction in sponsored search,” in Proceed- ings of the Third ACM International Conference on Web Search and Data Mining , WSDM ’10, (New York, NY, USA), pp. 351–360, ACM, 2010

work page 2010
[8]

Sequential click prediction for sponsored search with recurrent neural networks,

Y. Zhang, H. Dai, C. Xu, J. Feng, T. Wang, J. Bian, B. Wang, and T.-Y. Liu, “Sequential click prediction for sponsored search with recurrent neural networks,” in Proceedings of the Twenty-Eighth AAAI Conference on Artiﬁcial Intelligence, pp. 1369–1375, AAAI Press, 2014

work page 2014
[9]

Simple and scalable response prediction for display advertising,

O. Chapelle, E. Manavoglu, and R. Rosales, “Simple and scalable response prediction for display advertising,” ACM Trans. Intell. Syst. Technol. , vol. 5, pp. 61:1–61:34, Dec. 2014

work page 2014
[10]

A neural click model for web search,

A. Borisov, I. Markov, M. de Rijke, and P. Serdyukov, “A neural click model for web search,” in Proceedings of the 25th International Conference on World Wide Web , pp. 531–541, 2016

work page 2016
[11]

Wide & deep learning for recommender systems,

H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Cor- rado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, and H. Shah, “Wide & deep learning for recommender systems,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, (New York, NY, USA), pp. 7–10, ACM, 2016

work page 2016
[12]

Click-through prediction for advertising in twitter timeline,

C. Li, Y. Lu, Q. Mei, D. Wang, and S. Pandey, “Click-through prediction for advertising in twitter timeline,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’15, pp. 1959–1968, ACM, 2015

work page 1959
[13]

Deep ctr prediction in display advertising,

J. Chen, B. Sun, H. Li, H. Lu, and X.-S. Hua, “Deep ctr prediction in display advertising,” in Proceedings of the 24th ACM International Conference on Multimedia , MM ’16, (New York, NY, USA), pp. 811–820, ACM, 2016

work page 2016
[14]

Predicting clicks: Estimating the click- through rate for new ads,

M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click- through rate for new ads,” in Proceedings of the 16th International Conference on World Wide Web, WWW ’07, (New York, NY, USA), pp. 521–530, ACM, 2007

work page 2007
[15]

Spatio-temporal models for estimating click-through rate,

D. Agarwal, B. C. Chen, and P. Elango, “Spatio-temporal models for estimating click-through rate,” in WWW ’09: Proceedings of the 18th international conference on World wide web, (New York, NY, USA), pp. 21–30, ACM, 2009

work page 2009
[16]

Web-scale bayesian click- through rate prediction for sponsored search advertising in microsoft’s bing search engine,

T. Graepel, J. Q. n. Candela, T. Borchert, and R. Herbrich, “Web-scale bayesian click- through rate prediction for sponsored search advertising in microsoft’s bing search engine,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, (USA), pp. 13–20, Omnipress, 2010

work page 2010
[17]

Modeling delayed feedback in display advertising,

O. Chapelle, “Modeling delayed feedback in display advertising,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’14, (New York, NY, USA), pp. 1097–1105, ACM, 2014

work page 2014
[18]

Click Through Rate Prediction for Contextual Advertisment Using Linear Regression

M. J. Eﬀendi and S. A. Ali, “Click through rate prediction for contextual advertisment using linear regression,” CoRR, vol. abs/1701.08744, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Ad click prediction: a view from the trenches,

H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica, “Ad click prediction: a view from the trenches,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , 2013

work page 2013
[20]

Stochastic gradient boosting,

J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal. , vol. 38, pp. 367– 378, Feb. 2002

work page 2002
[22]

From RankNet to LambdaRank to LambdaMART: An overview,

C. J. C. Burges, “From RankNet to LambdaRank to LambdaMART: An overview,” tech. rep., Microsoft Research, 2010

work page 2010
[23]

Learning the click-through rate for rare/new ads from similar ads,

K. S. Dave and V. Varma, “Learning the click-through rate for rare/new ads from similar ads,” in Proceedings of the 33rd International ACM SIGIR Conference on Research and De- velopment in Information Retrieval , SIGIR ’10, (New York, NY, USA), pp. 897–898, ACM, 2010

work page 2010
[24]

On the optimality of the simple bayesian classiﬁer under zero-one loss,

P. Domingos and M. Pazzani, “On the optimality of the simple bayesian classiﬁer under zero-one loss,” Machine Learning, vol. 29, no. 2, pp. 103–130, 1997

work page 1997
[25]

Comparison of classiﬁcation methods based on the type of attributes and sample size.,

R. Entezari-Maleki, A. Rezaei, and B. Minaei-Bidgoli, “Comparison of classiﬁcation methods based on the type of attributes and sample size.,” JCIT, vol. 4, no. 3, pp. 94–102, 2009

work page 2009
[26]

Comparative study of classiﬁcation algorithms for immunosignaturing data.,

M. Kukreja, S. A. Johnston, and P. Staﬀord, “Comparative study of classiﬁcation algorithms for immunosignaturing data.,” BMC Bioinformatics, vol. 13, p. 139, 2012

work page 2012
[27]

Comparing machine learning classiﬁers in potential distribution modelling,

A. C. Lorena, L. F. Jacintho, M. F. Siqueira, R. D. Giovanni, L. G. Lohmann, A. C. de Car- valho, and M. Yamamoto, “Comparing machine learning classiﬁers in potential distribution modelling,” Expert Systems with Applications , vol. 38, no. 5, pp. 5268 – 5275, 2011

work page 2011
[28]

Deep Interest Network for Click-Through Rate Prediction

G. Zhou, C. Song, X. Zhu, X. Ma, Y. Yan, X. Dai, H. Zhu, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” CoRR, vol. abs/1706.06978, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

A convolutional click prediction model,

Q. Liu, F. Yu, S. Wu, and L. Wang, “A convolutional click prediction model,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management , CIKM ’15, (New York, NY, USA), pp. 1743–1746, ACM, 2015

work page 2015
[30]

Deep learning over multi-ﬁeld categorical data: A case study on user response prediction,

W. Zhang, T. Du, and J. Wang, “Deep learning over multi-ﬁeld categorical data: A case study on user response prediction,” in ECIR, 2016

work page 2016
[31]

Random forests,

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001

work page 2001
[32]

Classiﬁcation and regression by random forest,

A. Liaw and M. Wiener, “Classiﬁcation and regression by random forest,” R News , vol. 2, no. 3, pp. 18–22, 2002

work page 2002
[33]

Intrusion detection in network systems through hybrid supervised and unsupervised machine learning process: A case study on the iscx dataset,

S. Soheily-Khah, P. Marteau, and N. B´ echet, “Intrusion detection in network systems through hybrid supervised and unsupervised machine learning process: A case study on the iscx dataset,” in 2018 1st International Conference on Data Intelligence and Security (ICDIS) , pp. 219–226, April 2018

work page 2018
[34]

Predicting ads âĂŹ click-through rate with decision rules,

K. Dembczynski, W. Kotlowski, and D. Weiss, “Predicting ads âĂŹ click-through rate with decision rules,” in WWW2008, Beijing, China , 2008

work page 2008
[35]

Using boosted trees for click-through rate prediction for sponsored search,

I. Troﬁmov, A. Kornetova, and V. Topinskiy, “Using boosted trees for click-through rate prediction for sponsored search,” in Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy , ADKDD ’12, (New York, NY, USA), pp. 2:1–2:6, ACM, 2012

work page 2012
[36]

Predict the click-through rate and average cost per click for keywords using machine learning methodologies,

L. Shi and B. Li, “Predict the click-through rate and average cost per click for keywords using machine learning methodologies,” in Proceedings of the International Conference on Industrial Engineering and Operations ManagementDetroit, Michigan, USA , 2016

work page 2016
[37]

Deep crossing: Web-scale modeling without manually crafted combinatorial features,

Y. Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao, “Deep crossing: Web-scale modeling without manually crafted combinatorial features,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, (New York, NY, USA), pp. 255–262, ACM, 2016

work page 2016
[38]

Ensemble learning using frequent itemset mining for anomaly detection,

S. Soheily-Khah and Y. Wu, “Ensemble learning using frequent itemset mining for anomaly detection,” in International Conference on Artiﬁcial Intelligence, Soft Computing and Appli- cations (AIAA 2018) , 2018

work page 2018
[39]

The chi-square test of independence,

M. L. McHugh, “The chi-square test of independence,” Biochemia Medica , vol. 23, p. 143âĂ“149, 2013

work page 2013
[40]

Sample size and chi-squared test of ﬁtâĂŕ: A comparison between a random sample approach and a chi-square value adjustment method using swedish adolescent data.,

D. Bergh, “Sample size and chi-squared test of ﬁtâĂŕ: A comparison between a random sample approach and a chi-square value adjustment method using swedish adolescent data.,” In Paciﬁc Rim Objective Measurement Symposium (PROMS) 2014 Conference Proceedings , p. 197âĂ“211, 2015

work page 2014
[41]

T. M. Cover and J. A. Thomas, Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006

work page 2006
[42]

Kullback, Information Theory And Statistics

S. Kullback, Information Theory And Statistics . Dover Pubns, 1997

work page 1997
[43]

Mutual information based input feature selection for classiﬁcation prob- lems,

S. Cang and H. Yu, “Mutual information based input feature selection for classiﬁcation prob- lems,” Decision Support Systems, vol. 54, no. 1, pp. 691 – 698, 2012

work page 2012
[44]

A review of feature selection methods based on mutual information,

J. R. Vergara and P. A. Est´ evez, “A review of feature selection methods based on mutual information,” Neural Computing and Applications , vol. 24, pp. 175–186, Jan. 2014

work page 2014
[45]

Do we need hundreds of classiﬁers to solve real world classiﬁcation problems?,

M. Fern´ andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classiﬁers to solve real world classiﬁcation problems?,”J. Mach. Learn. Res. , vol. 15, pp. 3133– 3181, Jan. 2014

work page 2014
[46]

Are random forests truly the best classiﬁers?,

M. Wainberg, B. Alipanahi, and B. J. Frey, “Are random forests truly the best classiﬁers?,” J. Mach. Learn. Res. , vol. 17, pp. 3837–3841, Jan. 2016. Authors Saeid SOHEILY KHAH graduated in software engineering, and received master degree in artiﬁcial intelligence & robotics. He then received his second master degree in information analysis and management...

work page 2016

[1] [1]

An Enhanced Ad Event-Prediction Method Based on Feature Engineering

Introduction Ad event prediction is critical to many web applications including recommender systems, web search, sponsored search, and display advertising [1, 2, 3, 4, 5], and is a hot research direction in computational advertising [6, 7]. The event prediction is deﬁned to estimate the ratio of events such as videos, clicks or conversions to impressions ...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[2] [2]

State-of-the-art In the literature, a variety of classiﬁcation techniques such as logistic regression, support vector machine, (deep) neural network, nearest neighbor, naive Bayes, decision tree and random forest have been widely used as machine learning and data mining techniques for ad event prediction applications. Logistic regression contains many tec...

work page

[3] [3]

In any artiﬁcial intelligence or machine learning algorithm (e.g

Feature engineering Feature engineering is the fundamental to the application of machine learning, data analysis and mining as well as mostly all artiﬁcial intelligence tasks, and generally, is diﬃcult, costly and expensive. In any artiﬁcial intelligence or machine learning algorithm (e.g. predictive and classiﬁcation models), the features in the data are...

work page

[4] [4]

Typically, there are plenty of recorded information, attributes and measures in an executed marketing campaign

The design choices The proposed feature engineering strategy is brieﬂy presented in to the following steps (see Algorithm 1), where in the reminder of this section, we explain in detail the proposed feature learning approach for the ad event prediction. Typically, there are plenty of recorded information, attributes and measures in an executed marketing c...

work page

[5] [5]

Experimental study In this section, we ﬁrst describe the dataset used to conduct our experiments, then specify the validation process, prior to present and discuss the results that we obtained. 5.1. Data description In this section, to clarify our claim in ad event prediction, we used a large real-world dataset of a running marketing campaign. The dataset...

work page

[6] [6]

In this framework, we propose two statistical approaches which can be used for feature selection: i) the adjusted Chi-squared test and ii) the adjusted mutual information

Conclusion This research work introduces an enhanced ad event prediction framework which has been applied on big data. In this framework, we propose two statistical approaches which can be used for feature selection: i) the adjusted Chi-squared test and ii) the adjusted mutual information. Then, by ranking the statistical measures we select the best featu...

work page

[7] [7]

Personalized click prediction in sponsored search,

H. Cheng and E. Cant´ u-Paz, “Personalized click prediction in sponsored search,” in Proceed- ings of the Third ACM International Conference on Web Search and Data Mining , WSDM ’10, (New York, NY, USA), pp. 351–360, ACM, 2010

work page 2010

[8] [8]

Sequential click prediction for sponsored search with recurrent neural networks,

Y. Zhang, H. Dai, C. Xu, J. Feng, T. Wang, J. Bian, B. Wang, and T.-Y. Liu, “Sequential click prediction for sponsored search with recurrent neural networks,” in Proceedings of the Twenty-Eighth AAAI Conference on Artiﬁcial Intelligence, pp. 1369–1375, AAAI Press, 2014

work page 2014

[9] [9]

Simple and scalable response prediction for display advertising,

O. Chapelle, E. Manavoglu, and R. Rosales, “Simple and scalable response prediction for display advertising,” ACM Trans. Intell. Syst. Technol. , vol. 5, pp. 61:1–61:34, Dec. 2014

work page 2014

[10] [10]

A neural click model for web search,

A. Borisov, I. Markov, M. de Rijke, and P. Serdyukov, “A neural click model for web search,” in Proceedings of the 25th International Conference on World Wide Web , pp. 531–541, 2016

work page 2016

[11] [11]

Wide & deep learning for recommender systems,

H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Cor- rado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, and H. Shah, “Wide & deep learning for recommender systems,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS 2016, (New York, NY, USA), pp. 7–10, ACM, 2016

work page 2016

[12] [12]

Click-through prediction for advertising in twitter timeline,

C. Li, Y. Lu, Q. Mei, D. Wang, and S. Pandey, “Click-through prediction for advertising in twitter timeline,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’15, pp. 1959–1968, ACM, 2015

work page 1959

[13] [13]

Deep ctr prediction in display advertising,

J. Chen, B. Sun, H. Li, H. Lu, and X.-S. Hua, “Deep ctr prediction in display advertising,” in Proceedings of the 24th ACM International Conference on Multimedia , MM ’16, (New York, NY, USA), pp. 811–820, ACM, 2016

work page 2016

[14] [14]

Predicting clicks: Estimating the click- through rate for new ads,

M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click- through rate for new ads,” in Proceedings of the 16th International Conference on World Wide Web, WWW ’07, (New York, NY, USA), pp. 521–530, ACM, 2007

work page 2007

[15] [15]

Spatio-temporal models for estimating click-through rate,

D. Agarwal, B. C. Chen, and P. Elango, “Spatio-temporal models for estimating click-through rate,” in WWW ’09: Proceedings of the 18th international conference on World wide web, (New York, NY, USA), pp. 21–30, ACM, 2009

work page 2009

[16] [16]

Web-scale bayesian click- through rate prediction for sponsored search advertising in microsoft’s bing search engine,

T. Graepel, J. Q. n. Candela, T. Borchert, and R. Herbrich, “Web-scale bayesian click- through rate prediction for sponsored search advertising in microsoft’s bing search engine,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, (USA), pp. 13–20, Omnipress, 2010

work page 2010

[17] [17]

Modeling delayed feedback in display advertising,

O. Chapelle, “Modeling delayed feedback in display advertising,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’14, (New York, NY, USA), pp. 1097–1105, ACM, 2014

work page 2014

[18] [18]

Click Through Rate Prediction for Contextual Advertisment Using Linear Regression

M. J. Eﬀendi and S. A. Ali, “Click through rate prediction for contextual advertisment using linear regression,” CoRR, vol. abs/1701.08744, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Ad click prediction: a view from the trenches,

H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica, “Ad click prediction: a view from the trenches,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , 2013

work page 2013

[20] [20]

Stochastic gradient boosting,

J. H. Friedman, “Stochastic gradient boosting,” Comput. Stat. Data Anal. , vol. 38, pp. 367– 378, Feb. 2002

work page 2002

[21] [22]

From RankNet to LambdaRank to LambdaMART: An overview,

C. J. C. Burges, “From RankNet to LambdaRank to LambdaMART: An overview,” tech. rep., Microsoft Research, 2010

work page 2010

[22] [23]

Learning the click-through rate for rare/new ads from similar ads,

K. S. Dave and V. Varma, “Learning the click-through rate for rare/new ads from similar ads,” in Proceedings of the 33rd International ACM SIGIR Conference on Research and De- velopment in Information Retrieval , SIGIR ’10, (New York, NY, USA), pp. 897–898, ACM, 2010

work page 2010

[23] [24]

On the optimality of the simple bayesian classiﬁer under zero-one loss,

P. Domingos and M. Pazzani, “On the optimality of the simple bayesian classiﬁer under zero-one loss,” Machine Learning, vol. 29, no. 2, pp. 103–130, 1997

work page 1997

[24] [25]

Comparison of classiﬁcation methods based on the type of attributes and sample size.,

R. Entezari-Maleki, A. Rezaei, and B. Minaei-Bidgoli, “Comparison of classiﬁcation methods based on the type of attributes and sample size.,” JCIT, vol. 4, no. 3, pp. 94–102, 2009

work page 2009

[25] [26]

Comparative study of classiﬁcation algorithms for immunosignaturing data.,

M. Kukreja, S. A. Johnston, and P. Staﬀord, “Comparative study of classiﬁcation algorithms for immunosignaturing data.,” BMC Bioinformatics, vol. 13, p. 139, 2012

work page 2012

[26] [27]

Comparing machine learning classiﬁers in potential distribution modelling,

A. C. Lorena, L. F. Jacintho, M. F. Siqueira, R. D. Giovanni, L. G. Lohmann, A. C. de Car- valho, and M. Yamamoto, “Comparing machine learning classiﬁers in potential distribution modelling,” Expert Systems with Applications , vol. 38, no. 5, pp. 5268 – 5275, 2011

work page 2011

[27] [28]

Deep Interest Network for Click-Through Rate Prediction

G. Zhou, C. Song, X. Zhu, X. Ma, Y. Yan, X. Dai, H. Zhu, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” CoRR, vol. abs/1706.06978, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [29]

A convolutional click prediction model,

Q. Liu, F. Yu, S. Wu, and L. Wang, “A convolutional click prediction model,” in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management , CIKM ’15, (New York, NY, USA), pp. 1743–1746, ACM, 2015

work page 2015

[29] [30]

Deep learning over multi-ﬁeld categorical data: A case study on user response prediction,

W. Zhang, T. Du, and J. Wang, “Deep learning over multi-ﬁeld categorical data: A case study on user response prediction,” in ECIR, 2016

work page 2016

[30] [31]

Random forests,

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001

work page 2001

[31] [32]

Classiﬁcation and regression by random forest,

A. Liaw and M. Wiener, “Classiﬁcation and regression by random forest,” R News , vol. 2, no. 3, pp. 18–22, 2002

work page 2002

[32] [33]

Intrusion detection in network systems through hybrid supervised and unsupervised machine learning process: A case study on the iscx dataset,

S. Soheily-Khah, P. Marteau, and N. B´ echet, “Intrusion detection in network systems through hybrid supervised and unsupervised machine learning process: A case study on the iscx dataset,” in 2018 1st International Conference on Data Intelligence and Security (ICDIS) , pp. 219–226, April 2018

work page 2018

[33] [34]

Predicting ads âĂŹ click-through rate with decision rules,

K. Dembczynski, W. Kotlowski, and D. Weiss, “Predicting ads âĂŹ click-through rate with decision rules,” in WWW2008, Beijing, China , 2008

work page 2008

[34] [35]

Using boosted trees for click-through rate prediction for sponsored search,

I. Troﬁmov, A. Kornetova, and V. Topinskiy, “Using boosted trees for click-through rate prediction for sponsored search,” in Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy , ADKDD ’12, (New York, NY, USA), pp. 2:1–2:6, ACM, 2012

work page 2012

[35] [36]

Predict the click-through rate and average cost per click for keywords using machine learning methodologies,

L. Shi and B. Li, “Predict the click-through rate and average cost per click for keywords using machine learning methodologies,” in Proceedings of the International Conference on Industrial Engineering and Operations ManagementDetroit, Michigan, USA , 2016

work page 2016

[36] [37]

Deep crossing: Web-scale modeling without manually crafted combinatorial features,

Y. Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao, “Deep crossing: Web-scale modeling without manually crafted combinatorial features,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, (New York, NY, USA), pp. 255–262, ACM, 2016

work page 2016

[37] [38]

Ensemble learning using frequent itemset mining for anomaly detection,

S. Soheily-Khah and Y. Wu, “Ensemble learning using frequent itemset mining for anomaly detection,” in International Conference on Artiﬁcial Intelligence, Soft Computing and Appli- cations (AIAA 2018) , 2018

work page 2018

[38] [39]

The chi-square test of independence,

M. L. McHugh, “The chi-square test of independence,” Biochemia Medica , vol. 23, p. 143âĂ“149, 2013

work page 2013

[39] [40]

Sample size and chi-squared test of ﬁtâĂŕ: A comparison between a random sample approach and a chi-square value adjustment method using swedish adolescent data.,

D. Bergh, “Sample size and chi-squared test of ﬁtâĂŕ: A comparison between a random sample approach and a chi-square value adjustment method using swedish adolescent data.,” In Paciﬁc Rim Objective Measurement Symposium (PROMS) 2014 Conference Proceedings , p. 197âĂ“211, 2015

work page 2014

[40] [41]

T. M. Cover and J. A. Thomas, Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006

work page 2006

[41] [42]

Kullback, Information Theory And Statistics

S. Kullback, Information Theory And Statistics . Dover Pubns, 1997

work page 1997

[42] [43]

Mutual information based input feature selection for classiﬁcation prob- lems,

S. Cang and H. Yu, “Mutual information based input feature selection for classiﬁcation prob- lems,” Decision Support Systems, vol. 54, no. 1, pp. 691 – 698, 2012

work page 2012

[43] [44]

A review of feature selection methods based on mutual information,

J. R. Vergara and P. A. Est´ evez, “A review of feature selection methods based on mutual information,” Neural Computing and Applications , vol. 24, pp. 175–186, Jan. 2014

work page 2014

[44] [45]

Do we need hundreds of classiﬁers to solve real world classiﬁcation problems?,

M. Fern´ andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classiﬁers to solve real world classiﬁcation problems?,”J. Mach. Learn. Res. , vol. 15, pp. 3133– 3181, Jan. 2014

work page 2014

[45] [46]

Are random forests truly the best classiﬁers?,

M. Wainberg, B. Alipanahi, and B. J. Frey, “Are random forests truly the best classiﬁers?,” J. Mach. Learn. Res. , vol. 17, pp. 3837–3841, Jan. 2016. Authors Saeid SOHEILY KHAH graduated in software engineering, and received master degree in artiﬁcial intelligence & robotics. He then received his second master degree in information analysis and management...

work page 2016