What Do Developers Ask About ML Libraries? A Large-scale Study Using Stack Overflow

Hoan Anh Nguyen; Hridesh Rajan; Md Johirul Islam; Rangeet Pan

arxiv: 1906.11940 · v1 · pith:BWD2XQ6Bnew · submitted 2019-06-27 · 💻 cs.SE

What Do Developers Ask About ML Libraries? A Large-scale Study Using Stack Overflow

Md Johirul Islam , Hoan Anh Nguyen , Rangeet Pan , Hridesh Rajan This is my paper

Pith reviewed 2026-05-25 14:13 UTC · model grok-4.3

classification 💻 cs.SE

keywords machine learning librariesstack overflowdeveloper questionsAPI misuseML pipelinesoftware engineeringerror detection

0 comments

The pith

Analysis of 3,243 Stack Overflow posts on ten ML libraries shows static and dynamic analyses are absent and API misuses are common.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines 3,243 highly-rated questions from Stack Overflow about ten ML libraries to map the difficulties developers encounter when incorporating machine learning into systems. Questions are classified into seven stages of a standard ML pipeline, then analyzed statistically across four objectives that cover the hardest stages, problem types, library differences, and changes over time. The results indicate that support for early error detection is lacking and that API design changes would address frequent misuses. The work concludes that software engineering research must address these gaps to help developers avoid problems during model training and evaluation.

Core claim

Our findings reveal the urgent need for software engineering research in this area. Both static and dynamic analyses are mostly absent and badly needed to help developers find errors earlier. API misuses are prevalent and API design improvements are sorely needed. Last and somewhat surprisingly, a tug of war between providing higher levels of abstractions and the need to understand the behavior of the trained model is prevalent.

What carries the argument

Manual classification of questions into seven stages of an ML pipeline followed by statistical analysis across libraries and time periods.

If this is right

Static and dynamic analysis techniques must be developed specifically for ML library usage to catch errors before runtime.
Debugging support for ML systems requires substantially more research attention.
Redesign of ML library APIs is needed to reduce the rate of misuses observed in the questions.
Approaches that reconcile high-level abstractions with visibility into trained model internals should be explored.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Library maintainers could instrument their code with additional checks at the training and evaluation stages where questions cluster.
The observed tension between abstraction and model transparency may affect adoption rates of newer high-level ML frameworks.
Educational materials and documentation for ML libraries should prioritize the pipeline stages that generate the most questions.

Load-bearing premise

The 3,243 highly-rated Q&A posts selected from Stack Overflow are representative of the difficulties faced by software developers when learning about and using ML libraries in their systems.

What would settle it

A follow-up survey or interview study of practicing ML developers that finds their most common problems do not match the distribution of stages and error types identified in the Stack Overflow posts.

Figures

Figures reproduced from arXiv: 1906.11940 by Hoan Anh Nguyen, Hridesh Rajan, Md Johirul Islam, Rangeet Pan.

**Figure 2.** Figure 2: Then, a training session was conducted where each [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 2.** Figure 2: Classification used for categorizing ML library-related Stack Overflow questions for further analysis [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Cohen’s kappa coefficients for labeling process. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Question 40430186: An example showing dimension [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Question 45030966: An example question about [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 5.** Figure 5: Question 12319454: An example question on model [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: scikit-learn issue #4800: An example of hyperparameter tuning problem. The user filed a bug report, but a developer of the library responded that the problem was with hyperparameter tuning. 1 c l a s s s kl e a r n . ensemble . A d aB o o s t Cl a s si fi e r ( 2 b ase es tim a to r=None , n es tim a to rs =50 , l e a r ni n g r a t e = 1 . 0 , algo ri thm= ’SAMME.R ’ , random state=None ) The base estima… view at source ↗

**Figure 8.** Figure 8: We have identified two major groups. Group 1. Weka, H2O, scikit-learn, and MLlib form a strongly correlated group with correlation coefficient greater than 0.84 between the pairs. This suggests that the problems appearing in these libraries have some correlation and the difficulties of one library can be described by the difficulty of other libraries in the group. Finding 11: Weka, H2O, scikit-learn, MLlib… view at source ↗

**Figure 9.** Figure 9: Question 24617356: An example showing the API [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Difficulties over time, across different stages [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

read the original abstract

Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To that end, this work reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikit-learn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. We classify these questions into seven typical stages of an ML pipeline to understand the correlation between the library and the stage. Then we study the questions and perform statistical analysis to explore the answer to four research objectives (finding the most difficult stage, understanding the nature of problems, nature of libraries and studying whether the difficulties stayed consistent over time). Our findings reveal the urgent need for software engineering (SE) research in this area. Both static and dynamic analyses are mostly absent and badly needed to help developers find errors earlier. While there has been some early research on debugging, much more work is needed. API misuses are prevalent and API design improvements are sorely needed. Last and somewhat surprisingly, a tug of war between providing higher levels of abstractions and the need to understand the behavior of the trained model is prevalent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper catalogs questions on ten ML libraries from highly-rated Stack Overflow posts and breaks them down by pipeline stage, but the push for urgent new tooling rests on untested assumptions about what those posts represent.

read the letter

This study manually classifies over three thousand Stack Overflow posts across Tensorflow, Keras, scikit-learn and seven other libraries into seven ML pipeline stages. It reports distributions, some time trends, and notes that API misuse questions are common while static and dynamic analysis support is thin. That classification is the actual new data point; earlier SO studies were broader and did not drill into these specific libraries and stages with the same granularity.

Referee Report

2 major / 2 minor

Summary. The paper reports a manual examination and statistical analysis of 3,243 highly-rated Stack Overflow Q&A posts across ten ML libraries (TensorFlow, Keras, scikit-learn, etc.). Posts are classified into seven stages of a typical ML pipeline; the authors then address four research objectives on the most difficult stage, nature of problems, library differences, and temporal stability, concluding that static/dynamic analyses are absent, API misuses are prevalent, and API design improvements plus further SE research are urgently needed.

Significance. If the classification process proves reliable and the highly-rated SO sample is representative, the work supplies concrete evidence of tooling gaps at the SE-ML boundary and could usefully guide priorities for static analysis, debugging support, and API usability research. The multi-library scope and pipeline-stage framing are strengths that would make the findings actionable for both researchers and library maintainers.

major comments (2)

[Methodology (data collection and classification)] Methodology section (data collection and classification): the abstract and text describe manual examination and assignment to seven pipeline stages but supply no information on inter-rater reliability, how the seven stages themselves were validated or pilot-tested, or the precise exclusion criteria applied to arrive at the final 3,243 posts. These omissions directly affect the soundness of every subsequent statistical claim and the identification of 'most difficult' stages.
[Results and Discussion] Results and Discussion sections: the central extrapolation that 'both static and dynamic analyses are mostly absent and badly needed' and that 'API misuses are prevalent' rests on the assumption that the selected highly-rated SO posts represent the difficulties faced by developers in general. No cross-validation against GitHub issues, developer surveys, or usage telemetry is reported, leaving the generalization load-bearing for the 'urgent need' conclusion.

minor comments (2)

[Abstract] Abstract: the phrase 'highly-rated' is used without stating the exact rating threshold or vote count applied during selection.
[Introduction / Research Objectives] The description of the four research objectives would benefit from explicit mapping to the statistical tests or tables that address each one.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: Methodology section (data collection and classification): the abstract and text describe manual examination and assignment to seven pipeline stages but supply no information on inter-rater reliability, how the seven stages themselves were validated or pilot-tested, or the precise exclusion criteria applied to arrive at the final 3,243 posts. These omissions directly affect the soundness of every subsequent statistical claim and the identification of 'most difficult' stages.

Authors: We agree that the methodology section would benefit from greater transparency. The seven pipeline stages were derived from standard descriptions in the ML and SE literature (e.g., data preparation, model training, evaluation). Exclusion criteria included posts tagged with the ten libraries, having an accepted answer, and a minimum score threshold to focus on highly-rated content; non-English posts and duplicates were removed. Classification was performed by the first two authors, with disagreements resolved via discussion until consensus. No formal inter-rater reliability statistic (e.g., Cohen's kappa) was computed. We will expand the methodology section with explicit stage definitions, a description of the pilot phase used to refine the stages, the exact exclusion rules, and the consensus process. revision: yes
Referee: Results and Discussion sections: the central extrapolation that 'both static and dynamic analyses are mostly absent and badly needed' and that 'API misuses are prevalent' rests on the assumption that the selected highly-rated SO posts represent the difficulties faced by developers in general. No cross-validation against GitHub issues, developer surveys, or usage telemetry is reported, leaving the generalization load-bearing for the 'urgent need' conclusion.

Authors: The study is explicitly scoped to highly-rated Stack Overflow posts, which serve as a public record of developer difficulties that have been vetted by the community through votes and answers. This source is commonly used in empirical SE research on API usage and learning barriers. We acknowledge that the absence of triangulation with GitHub issues or surveys limits the strength of broad generalizations. We will revise the discussion and threats-to-validity sections to (a) more precisely bound the claims to the SO dataset and (b) explicitly call for future multi-source validation studies. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational empirical classification study

full rationale

The paper conducts a manual examination and classification of 3,243 Stack Overflow posts into ML pipeline stages, followed by statistical analysis of observed patterns (e.g., prevalent API misuses). No equations, fitted parameters renamed as predictions, self-citation chains, uniqueness theorems, or ansatzes are present. All claims derive directly from the selected data without reduction to inputs by construction. Representativeness concerns affect external validity but do not create circularity in the reported derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of the chosen seven-stage ML pipeline taxonomy and the assumption that highly-rated Stack Overflow posts capture representative developer difficulties; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The seven typical stages of an ML pipeline form a valid and exhaustive categorization for classifying developer questions.
Invoked to correlate library with stage and to identify the most difficult stage.

pith-pipeline@v0.9.0 · 5787 in / 1105 out tokens · 29362 ms · 2026-05-25T14:13:31.035028+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

Top 15 Frameworks for Machine Learning Experts,

kdnuggets, “Top 15 Frameworks for Machine Learning Experts,” 2016, https://www.kdnuggets.com/2016/04/ top-15-frameworks-machine-learning-experts.html

work page 2016
[2]

Machine learning: The high-interest credit card of technical debt,

D. Sculley, T. Phillips, D. Ebner, V . Chaudhary, and M. Young, “Machine learning: The high-interest credit card of technical debt,” 2014

work page 2014
[3]

Hidden technical debt in machine learning systems,

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V . Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, “Hidden technical debt in machine learning systems,” in Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 , ser. NIPS’15. Cambridge, MA, USA: MIT Press, 2015, pp. 2503–2511. [Online]...

work page arXiv 2015
[4]

What’s your ml test score? a rubric for ml production systems,

E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley, “What’s your ml test score? a rubric for ml production systems,” in NIPS Workshop on Reliable Machine Learning in the Wild , 2016

work page 2016
[5]

How do programmers ask and answer questions on the web?: Nier track,

C. Treude, O. Barzilay, and M.-A. Storey, “How do programmers ask and answer questions on the web?: Nier track,” in Software Engineering (ICSE), 2011 33rd International Conference on . IEEE, 2011, pp. 804–807

work page 2011
[6]

An empirical study on developer interactions in stackoverﬂow,

S. Wang, D. Lo, and L. Jiang, “An empirical study on developer interactions in stackoverﬂow,” in Proceedings of the 28th Annual ACM Symposium on Applied Computing . ACM, 2013, pp. 1019– 1024

work page 2013
[7]

Sparrows and owls: Characterisation of expert behaviour in stackoverﬂow,

J. Yang, K. Tao, A. Bozzon, and G.-J. Houben, “Sparrows and owls: Characterisation of expert behaviour in stackoverﬂow,” in Interna- tional Conference on User Modeling, Adaptation, and Personalization . Springer, 2014, pp. 266–277

work page 2014
[8]

Using and asking: APIs used in the android market and asked about in stackoverﬂow,

D. Kavaler, D. Posnett, C. Gibler, H. Chen, P . Devanbu, and V . Filkov, “Using and asking: APIs used in the android market and asked about in stackoverﬂow,” in International Conference on Social Informatics. Springer, 2013, pp. 405–418

work page 2013
[9]

How do API changes trigger stack overﬂow discussions? a study on the android sdk,

M. Linares-V ´asquez, G. Bavota, M. Di Penta, R. Oliveto, and D. Poshyvanyk, “How do API changes trigger stack overﬂow discussions? a study on the android sdk,” in proceedings of the 22nd International Conference on Program Comprehension . ACM, 2014, pp. 83–94

work page 2014
[10]

Selecting best answer: An empirical analysis on community question answering sites,

T. P . Sahu, N. K. Nagwani, and S. Verma, “Selecting best answer: An empirical analysis on community question answering sites,” IEEE Access, vol. 4, pp. 4797–4808, 2016

work page 2016
[11]

What are developers talking about? an analysis of topics and trends in stack overﬂow,

A. Barua, S. W. Thomas, and A. E. Hassan, “What are developers talking about? an analysis of topics and trends in stack overﬂow,” Empirical Software Engineering, vol. 19, no. 3, pp. 619–654, 2014

work page 2014
[12]

Detecting api usage obstacles: A study of ios and android developer questions,

W. Wang and M. W. Godfrey, “Detecting api usage obstacles: A study of ios and android developer questions,” in Proceedings of the 10th Working Conference on Mining Software Repositories . IEEE Press, 2013, pp. 61–64

work page 2013
[13]

An empirical study on the usage of the swift program- ming language,

M. Rebouc ¸as, G. Pinto, F. Ebert, W. Torres, A. Serebrenik, and F. Castor, “An empirical study on the usage of the swift program- ming language,” in Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, vol. 1. IEEE, 2016, pp. 634–638

work page 2016
[14]

Evaluating bug severity using crowd-based knowledge: An exploratory study,

Y. Zhang, G. Yin, T. Wang, Y. Yu, and H. Wang, “Evaluating bug severity using crowd-based knowledge: An exploratory study,” in Proceedings of the 7th Asia-Paciﬁc Symposium on Internetware. ACM, 2015, pp. 70–73

work page 2015
[15]

Geo-locating the knowledge transfer in stackoverﬂow,

D. Schenk and M. Lungu, “Geo-locating the knowledge transfer in stackoverﬂow,” in Proceedings of the 2013 International Workshop on Social Software Engineering. ACM, 2013, pp. 21–24

work page 2013
[16]

Predicting tags for stackoverﬂow posts,

C. Stanley and M. D. Byrne, “Predicting tags for stackoverﬂow posts,” in Proceedings of ICCM, vol. 2013, 2013

work page 2013
[17]

An empirical study of api stability and adoption in the android ecosystem,

T. McDonnell, B. Ray, and M. Kim, “An empirical study of api stability and adoption in the android ecosystem,” in Software Maintenance (ICSM), 2013 29th IEEE International Conference on . IEEE, 2013, pp. 70–79

work page 2013
[18]

Predicting the quality of questions on stackoverﬂow,

A. Baltadzhieva and G. Chrupała, “Predicting the quality of questions on stackoverﬂow,” in Proceedings of the International Conference Recent Advances in Natural Language Processing, 2015, pp. 32–40

work page 2015
[19]

Text mining stackover- ﬂow: An insight into challenges and subject-related difﬁculties faced by computer science learners,

A. Joorabchi, M. English, and A. E. Mahdi, “Text mining stackover- ﬂow: An insight into challenges and subject-related difﬁculties faced by computer science learners,” Journal of Enterprise Informa- tion Management, vol. 29, no. 2, pp. 255–275, 2016

work page 2016
[20]

Caffe: Convolutional architecture for fast feature embedding,

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM inter- national conference on Multimedia. ACM, 2014, pp. 675–678

work page 2014
[21]

Deep learning with h2o,

A. Candel, V . Parmar, E. LeDell, and A. Arora, “Deep learning with h2o,” H2O. ai Inc, 2016

work page 2016
[22]

Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015

F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015

work page 2015
[23]

Owen and S

S. Owen and S. Owen, Mahout in action. Manning Shelter Island, NY, 2012

work page 2012
[24]

Mllib: Machine learning in apache spark,

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen et al. , “Mllib: Machine learning in apache spark,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1235–1241, 2016

work page 2016
[25]

Scikit-learn: Machine learning in python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P . Prettenhofer, R. Weiss, V . Dubourget al., “Scikit-learn: Machine learning in python,” Journal of machine learning research, vol. 12, no. Oct, pp. 2825–2830, 2011

work page 2011
[26]

Tensorﬂow: A system for large-scale machine learning

M. Abadi, P . Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorﬂow: A system for large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283

work page 2016
[27]

Theano: Deep learning on gpus with python,

J. Bergstra, F. Bastien, O. Breuleux, P . Lamblin, R. Pascanu, O. De- lalleau, G. Desjardins, D. Warde-Farley, I. Goodfellow, A. Bergeron et al., “Theano: Deep learning on gpus with python,” in NIPS 2011, BigLearning Workshop, Granada, Spain, vol. 3. Citeseer, 2011

work page 2011
[28]

Torch: a modular machine learning software library,

R. Collobert, S. Bengio, and J. Mari ´ethoz, “Torch: a modular machine learning software library,” Idiap, Tech. Rep., 2002

work page 2002
[29]

Weka: A machine learn- ing workbench,

G. Holmes, A. Donkin, and I. H. Witten, “Weka: A machine learn- ing workbench,” in Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on. IEEE, 1994, pp. 357–361

work page 1994
[30]

API design for machine learning software: experiences from the scikit- learn project,

L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V . Niculae, P . Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine learning software: experiences from the scikit- learn project,” in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, p...

work page 2013
[31]

Various frameworks and libraries of machine learning and deep learning: A survey,

Z. Wang, K. Liu, J. Li, Y. Zhu, and Y. Zhang, “Various frameworks and libraries of machine learning and deep learning: A survey,” Archives of Computational Methods in Engineering , pp. 1–24, 2019

work page 2019
[32]

Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey,

G. Nguyen, S. Dlugolinsky, M. Bob ´ak, V . Tran, ´A. L. Garc ´ıa, I. Heredia, P . Mal´ık, and L. Hluch `y, “Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey,” Artiﬁcial Intelligence Review, pp. 1–48, 2019

work page 2019
[33]

Crowd- sourced knowledge on stack overﬂow: A systematic mapping study,

S. Meldrum, S. A. Licorish, and B. T. R. Savarimuthu, “Crowd- sourced knowledge on stack overﬂow: A systematic mapping study,” in Proceedings of the 21st International Conference on Eval- uation and Assessment in Software Engineering . ACM, 2017, pp. 180–185

work page 2017
[34]

The 7 Steps of Machine Learn- ing,

Yufeng Guo, “The 7 Steps of Machine Learn- ing,” 2017, https://towardsdatascience.com/ the-7-steps-of-machine-learning-2877d7e5548e

work page 2017
[35]

Coding qualitative data,

S. Lockyer, “Coding qualitative data,” The Sage encyclopedia of social science research methods, vol. 1, no. 1, pp. 137–138, 2004

work page 2004
[36]

Qualitative data analysis,

R. S. Life, “Qualitative data analysis,” 1994

work page 1994
[37]

Strauss and J

A. Strauss and J. Corbin, Basics of qualitative research . Sage publications, 1990

work page 1990
[38]

Computing inter-rater reliability for observational data: an overview and tutorial,

K. A. Hallgren, “Computing inter-rater reliability for observational data: an overview and tutorial,” Tutorials in quantitative methods for psychology, vol. 8, no. 1, p. 23, 2012

work page 2012
[39]

A machine learning pipeline for quantitative phenotype prediction from genotype data,

G. Guzzetta, G. Jurman, and C. Furlanello, “A machine learning pipeline for quantitative phenotype prediction from genotype data,” BMC bioinformatics, vol. 11, no. 8, p. S3, 2010. 13

work page 2010
[40]

What are mobile developers asking about? a large scale study using stack overﬂow,

C. Rosen and E. Shihab, “What are mobile developers asking about? a large scale study using stack overﬂow,” Empirical Software Md Johirul Islam is a doctoral candidate at Iowa State University. His research interests in- clude machine learning program analysis, soft- ware techniques for machine learning, and pro- gramming languages. He has published works...

work page 2016
[41]

Software engi- neering for machine learning: a case study,

S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, “Software engi- neering for machine learning: a case study,” in Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE Press, 2019, pp. 291–300

work page 2019
[42]

Using Caffe with your own dataset,

Alexandr Honchar, “Using Caffe with your own dataset,” 2017, https://medium.com/machine-learning-world/ using-caffe-with-your-own-dataset-b0ade5d71233

work page 2017
[43]

Debugging Machine Learning Tasks

A. Chakarov, A. Nori, S. Rajamani, S. Sen, and D. Vijay- keerthy, “Debugging machine learning tasks,” arXiv preprint arXiv:1603.07292, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[44]

Debugging TensorFlow Programs,

Tensorﬂow, “Debugging TensorFlow Programs,” 2016, https:// www.tensorﬂow.org/programmers guide/debugger

work page 2016
[45]

Effects of loss functions and target represen- tations on adversarial robustness,

S. Saito and S. Roy, “Effects of loss functions and target represen- tations on adversarial robustness,” arXiv preprint arXiv:1812.00181, 2018

work page arXiv 2018
[46]

Mubench: A benchmark for api-misuse detectors,

S. Amann, S. Nadi, H. A. Nguyen, T. N. Nguyen, and M. Mezini, “Mubench: A benchmark for api-misuse detectors,” in Proceedings of the 13th International Conference on Mining Software Repositories , ser. MSR ’16. New York, NY, USA: ACM, 2016, pp. 464–467. [Online]. Available: http://doi.acm.org/10.1145/2901739.2903506

work page doi:10.1145/2901739.2903506 2016
[47]

On the kolmogorov-smirnov test for normality with mean and variance unknown,

H. W. Lilliefors, “On the kolmogorov-smirnov test for normality with mean and variance unknown,” Journal of the American statis- tical Association, vol. 62, no. 318, pp. 399–402, 1967

work page 1967
[48]

A quick view on current techniques and machine learning algorithms for big data analytics,

J. L. Berral-Garc ´ıa, “A quick view on current techniques and machine learning algorithms for big data analytics,” in 2016 18th international conference on transparent optical networks (ICTON). IEEE, 2016, pp. 1–4. Hridesh Rajan is the Kingland Professor in the Computer Science Department at Iowa State University (ISU) where he has been since 2005. His r...

work page 2016

[1] [1]

Top 15 Frameworks for Machine Learning Experts,

kdnuggets, “Top 15 Frameworks for Machine Learning Experts,” 2016, https://www.kdnuggets.com/2016/04/ top-15-frameworks-machine-learning-experts.html

work page 2016

[2] [2]

Machine learning: The high-interest credit card of technical debt,

D. Sculley, T. Phillips, D. Ebner, V . Chaudhary, and M. Young, “Machine learning: The high-interest credit card of technical debt,” 2014

work page 2014

[3] [3]

Hidden technical debt in machine learning systems,

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V . Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, “Hidden technical debt in machine learning systems,” in Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 , ser. NIPS’15. Cambridge, MA, USA: MIT Press, 2015, pp. 2503–2511. [Online]...

work page arXiv 2015

[4] [4]

What’s your ml test score? a rubric for ml production systems,

E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley, “What’s your ml test score? a rubric for ml production systems,” in NIPS Workshop on Reliable Machine Learning in the Wild , 2016

work page 2016

[5] [5]

How do programmers ask and answer questions on the web?: Nier track,

C. Treude, O. Barzilay, and M.-A. Storey, “How do programmers ask and answer questions on the web?: Nier track,” in Software Engineering (ICSE), 2011 33rd International Conference on . IEEE, 2011, pp. 804–807

work page 2011

[6] [6]

An empirical study on developer interactions in stackoverﬂow,

S. Wang, D. Lo, and L. Jiang, “An empirical study on developer interactions in stackoverﬂow,” in Proceedings of the 28th Annual ACM Symposium on Applied Computing . ACM, 2013, pp. 1019– 1024

work page 2013

[7] [7]

Sparrows and owls: Characterisation of expert behaviour in stackoverﬂow,

J. Yang, K. Tao, A. Bozzon, and G.-J. Houben, “Sparrows and owls: Characterisation of expert behaviour in stackoverﬂow,” in Interna- tional Conference on User Modeling, Adaptation, and Personalization . Springer, 2014, pp. 266–277

work page 2014

[8] [8]

Using and asking: APIs used in the android market and asked about in stackoverﬂow,

D. Kavaler, D. Posnett, C. Gibler, H. Chen, P . Devanbu, and V . Filkov, “Using and asking: APIs used in the android market and asked about in stackoverﬂow,” in International Conference on Social Informatics. Springer, 2013, pp. 405–418

work page 2013

[9] [9]

How do API changes trigger stack overﬂow discussions? a study on the android sdk,

M. Linares-V ´asquez, G. Bavota, M. Di Penta, R. Oliveto, and D. Poshyvanyk, “How do API changes trigger stack overﬂow discussions? a study on the android sdk,” in proceedings of the 22nd International Conference on Program Comprehension . ACM, 2014, pp. 83–94

work page 2014

[10] [10]

Selecting best answer: An empirical analysis on community question answering sites,

T. P . Sahu, N. K. Nagwani, and S. Verma, “Selecting best answer: An empirical analysis on community question answering sites,” IEEE Access, vol. 4, pp. 4797–4808, 2016

work page 2016

[11] [11]

What are developers talking about? an analysis of topics and trends in stack overﬂow,

A. Barua, S. W. Thomas, and A. E. Hassan, “What are developers talking about? an analysis of topics and trends in stack overﬂow,” Empirical Software Engineering, vol. 19, no. 3, pp. 619–654, 2014

work page 2014

[12] [12]

Detecting api usage obstacles: A study of ios and android developer questions,

W. Wang and M. W. Godfrey, “Detecting api usage obstacles: A study of ios and android developer questions,” in Proceedings of the 10th Working Conference on Mining Software Repositories . IEEE Press, 2013, pp. 61–64

work page 2013

[13] [13]

An empirical study on the usage of the swift program- ming language,

M. Rebouc ¸as, G. Pinto, F. Ebert, W. Torres, A. Serebrenik, and F. Castor, “An empirical study on the usage of the swift program- ming language,” in Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, vol. 1. IEEE, 2016, pp. 634–638

work page 2016

[14] [14]

Evaluating bug severity using crowd-based knowledge: An exploratory study,

Y. Zhang, G. Yin, T. Wang, Y. Yu, and H. Wang, “Evaluating bug severity using crowd-based knowledge: An exploratory study,” in Proceedings of the 7th Asia-Paciﬁc Symposium on Internetware. ACM, 2015, pp. 70–73

work page 2015

[15] [15]

Geo-locating the knowledge transfer in stackoverﬂow,

D. Schenk and M. Lungu, “Geo-locating the knowledge transfer in stackoverﬂow,” in Proceedings of the 2013 International Workshop on Social Software Engineering. ACM, 2013, pp. 21–24

work page 2013

[16] [16]

Predicting tags for stackoverﬂow posts,

C. Stanley and M. D. Byrne, “Predicting tags for stackoverﬂow posts,” in Proceedings of ICCM, vol. 2013, 2013

work page 2013

[17] [17]

An empirical study of api stability and adoption in the android ecosystem,

T. McDonnell, B. Ray, and M. Kim, “An empirical study of api stability and adoption in the android ecosystem,” in Software Maintenance (ICSM), 2013 29th IEEE International Conference on . IEEE, 2013, pp. 70–79

work page 2013

[18] [18]

Predicting the quality of questions on stackoverﬂow,

A. Baltadzhieva and G. Chrupała, “Predicting the quality of questions on stackoverﬂow,” in Proceedings of the International Conference Recent Advances in Natural Language Processing, 2015, pp. 32–40

work page 2015

[19] [19]

Text mining stackover- ﬂow: An insight into challenges and subject-related difﬁculties faced by computer science learners,

A. Joorabchi, M. English, and A. E. Mahdi, “Text mining stackover- ﬂow: An insight into challenges and subject-related difﬁculties faced by computer science learners,” Journal of Enterprise Informa- tion Management, vol. 29, no. 2, pp. 255–275, 2016

work page 2016

[20] [20]

Caffe: Convolutional architecture for fast feature embedding,

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM inter- national conference on Multimedia. ACM, 2014, pp. 675–678

work page 2014

[21] [21]

Deep learning with h2o,

A. Candel, V . Parmar, E. LeDell, and A. Arora, “Deep learning with h2o,” H2O. ai Inc, 2016

work page 2016

[22] [22]

Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015

F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015

work page 2015

[23] [23]

Owen and S

S. Owen and S. Owen, Mahout in action. Manning Shelter Island, NY, 2012

work page 2012

[24] [24]

Mllib: Machine learning in apache spark,

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen et al. , “Mllib: Machine learning in apache spark,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1235–1241, 2016

work page 2016

[25] [25]

Scikit-learn: Machine learning in python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P . Prettenhofer, R. Weiss, V . Dubourget al., “Scikit-learn: Machine learning in python,” Journal of machine learning research, vol. 12, no. Oct, pp. 2825–2830, 2011

work page 2011

[26] [26]

Tensorﬂow: A system for large-scale machine learning

M. Abadi, P . Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorﬂow: A system for large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283

work page 2016

[27] [27]

Theano: Deep learning on gpus with python,

J. Bergstra, F. Bastien, O. Breuleux, P . Lamblin, R. Pascanu, O. De- lalleau, G. Desjardins, D. Warde-Farley, I. Goodfellow, A. Bergeron et al., “Theano: Deep learning on gpus with python,” in NIPS 2011, BigLearning Workshop, Granada, Spain, vol. 3. Citeseer, 2011

work page 2011

[28] [28]

Torch: a modular machine learning software library,

R. Collobert, S. Bengio, and J. Mari ´ethoz, “Torch: a modular machine learning software library,” Idiap, Tech. Rep., 2002

work page 2002

[29] [29]

Weka: A machine learn- ing workbench,

G. Holmes, A. Donkin, and I. H. Witten, “Weka: A machine learn- ing workbench,” in Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on. IEEE, 1994, pp. 357–361

work page 1994

[30] [30]

API design for machine learning software: experiences from the scikit- learn project,

L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V . Niculae, P . Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine learning software: experiences from the scikit- learn project,” in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013, p...

work page 2013

[31] [31]

Various frameworks and libraries of machine learning and deep learning: A survey,

Z. Wang, K. Liu, J. Li, Y. Zhu, and Y. Zhang, “Various frameworks and libraries of machine learning and deep learning: A survey,” Archives of Computational Methods in Engineering , pp. 1–24, 2019

work page 2019

[32] [32]

Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey,

G. Nguyen, S. Dlugolinsky, M. Bob ´ak, V . Tran, ´A. L. Garc ´ıa, I. Heredia, P . Mal´ık, and L. Hluch `y, “Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey,” Artiﬁcial Intelligence Review, pp. 1–48, 2019

work page 2019

[33] [33]

Crowd- sourced knowledge on stack overﬂow: A systematic mapping study,

S. Meldrum, S. A. Licorish, and B. T. R. Savarimuthu, “Crowd- sourced knowledge on stack overﬂow: A systematic mapping study,” in Proceedings of the 21st International Conference on Eval- uation and Assessment in Software Engineering . ACM, 2017, pp. 180–185

work page 2017

[34] [34]

The 7 Steps of Machine Learn- ing,

Yufeng Guo, “The 7 Steps of Machine Learn- ing,” 2017, https://towardsdatascience.com/ the-7-steps-of-machine-learning-2877d7e5548e

work page 2017

[35] [35]

Coding qualitative data,

S. Lockyer, “Coding qualitative data,” The Sage encyclopedia of social science research methods, vol. 1, no. 1, pp. 137–138, 2004

work page 2004

[36] [36]

Qualitative data analysis,

R. S. Life, “Qualitative data analysis,” 1994

work page 1994

[37] [37]

Strauss and J

A. Strauss and J. Corbin, Basics of qualitative research . Sage publications, 1990

work page 1990

[38] [38]

Computing inter-rater reliability for observational data: an overview and tutorial,

K. A. Hallgren, “Computing inter-rater reliability for observational data: an overview and tutorial,” Tutorials in quantitative methods for psychology, vol. 8, no. 1, p. 23, 2012

work page 2012

[39] [39]

A machine learning pipeline for quantitative phenotype prediction from genotype data,

G. Guzzetta, G. Jurman, and C. Furlanello, “A machine learning pipeline for quantitative phenotype prediction from genotype data,” BMC bioinformatics, vol. 11, no. 8, p. S3, 2010. 13

work page 2010

[40] [40]

What are mobile developers asking about? a large scale study using stack overﬂow,

C. Rosen and E. Shihab, “What are mobile developers asking about? a large scale study using stack overﬂow,” Empirical Software Md Johirul Islam is a doctoral candidate at Iowa State University. His research interests in- clude machine learning program analysis, soft- ware techniques for machine learning, and pro- gramming languages. He has published works...

work page 2016

[41] [41]

Software engi- neering for machine learning: a case study,

S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, “Software engi- neering for machine learning: a case study,” in Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE Press, 2019, pp. 291–300

work page 2019

[42] [42]

Using Caffe with your own dataset,

Alexandr Honchar, “Using Caffe with your own dataset,” 2017, https://medium.com/machine-learning-world/ using-caffe-with-your-own-dataset-b0ade5d71233

work page 2017

[43] [43]

Debugging Machine Learning Tasks

A. Chakarov, A. Nori, S. Rajamani, S. Sen, and D. Vijay- keerthy, “Debugging machine learning tasks,” arXiv preprint arXiv:1603.07292, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[44] [44]

Debugging TensorFlow Programs,

Tensorﬂow, “Debugging TensorFlow Programs,” 2016, https:// www.tensorﬂow.org/programmers guide/debugger

work page 2016

[45] [45]

Effects of loss functions and target represen- tations on adversarial robustness,

S. Saito and S. Roy, “Effects of loss functions and target represen- tations on adversarial robustness,” arXiv preprint arXiv:1812.00181, 2018

work page arXiv 2018

[46] [46]

Mubench: A benchmark for api-misuse detectors,

S. Amann, S. Nadi, H. A. Nguyen, T. N. Nguyen, and M. Mezini, “Mubench: A benchmark for api-misuse detectors,” in Proceedings of the 13th International Conference on Mining Software Repositories , ser. MSR ’16. New York, NY, USA: ACM, 2016, pp. 464–467. [Online]. Available: http://doi.acm.org/10.1145/2901739.2903506

work page doi:10.1145/2901739.2903506 2016

[47] [47]

On the kolmogorov-smirnov test for normality with mean and variance unknown,

H. W. Lilliefors, “On the kolmogorov-smirnov test for normality with mean and variance unknown,” Journal of the American statis- tical Association, vol. 62, no. 318, pp. 399–402, 1967

work page 1967

[48] [48]

A quick view on current techniques and machine learning algorithms for big data analytics,

J. L. Berral-Garc ´ıa, “A quick view on current techniques and machine learning algorithms for big data analytics,” in 2016 18th international conference on transparent optical networks (ICTON). IEEE, 2016, pp. 1–4. Hridesh Rajan is the Kingland Professor in the Computer Science Department at Iowa State University (ISU) where he has been since 2005. His r...

work page 2016