pith. sign in

arxiv: 2402.16391 · v4 · submitted 2024-02-26 · 💻 cs.SE

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions

Pith reviewed 2026-05-24 04:20 UTC · model grok-4.3

classification 💻 cs.SE
keywords AI model qualitypractitioner perspectivesquality attributesdata imbalanceactive learningsoftware engineeringAI deployment
0
0 comments X

The pith

AI model quality priorities shift by application context according to industry interviews.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that industry practitioners do not treat all quality attributes equally; their priorities depend on the specific use case, such as favoring efficiency over correctness in real-time systems. Data imbalance is identified as a key challenge to correctness and robustness, with active learning as a common mitigation. The findings from 15 interviews are validated by a survey of 50 practitioners, suggesting these views are common. A sympathetic reader would care because this can guide AI research to address the attributes that matter most in practice rather than assuming universal importance of correctness.

Core claim

Through interviews with 15 AI practitioners, the paper finds that practitioners prioritize quality attributes differently depending on context. For instance, efficiency can be more important than correctness in real-time applications, while scalability and deployability are no longer primary concerns. Data imbalance is a major obstacle to maintaining model correctness and robustness, and practitioners often use strategies like active learning to mitigate it. These findings are largely confirmed by a survey of 50 practitioners.

What carries the argument

Context-dependent prioritization of nine key quality attributes, revealed through practitioner interviews and validated by survey.

If this is right

  • Researchers should focus on attributes practitioners value most, such as efficiency in certain contexts.
  • Improving one attribute should not come at the expense of others considered more critical.
  • Data imbalance mitigation techniques like active learning should be further developed.
  • Scalability and deployability may receive less attention in future AI development.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Quality assessment frameworks for AI may need to be customizable based on application domain.
  • This suggests potential trade-offs in model development that current benchmarks do not capture.
  • Future studies could observe actual deployed models to verify self-reported practices.

Load-bearing premise

The 15 interviewed and 50 surveyed practitioners represent the broader population of industry AI practitioners and their self-reports match actual practices.

What would settle it

A study finding that a majority of practitioners still consider scalability a primary concern across contexts would contradict the claims.

Figures

Figures reproduced from arXiv: 2402.16391 by Chenyu Wang, Daniela Damian, David Lo, Yunbo Lyu, Ze Shi Li, Zhou Yang.

Figure 1
Figure 1. Figure 1: The workflow of our study. AI-based systems development and identified the challenges and opportunities in this area. Felderer et al. [38] also pointed out many challenges that QA4AI faces, such as the understandability and interpretability of AI models, accuracy and correctness measures, and dynamic and frequently changing environments. Existing studies have delved into certain QA4AI aspects within the in… view at source ↗
Figure 2
Figure 2. Figure 2: The ranking result of each QA4AI property. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
read the original abstract

Artificial Intelligence (AI) is now used across nearly every industry, making AI model quality essential for building reliable and trustworthy systems. Historically, correctness has been the main focus, but industry AI models must also satisfy many other important quality attributes. To understand how these attributes are perceived, the challenges they create, and the solutions used in practice, we identify nine key quality attributes and interview 15 AI practitioners from diverse backgrounds. The interviews show that practitioners prioritize attributes differently depending on context. For example, efficiency can matter more than correctness in real-time applications, while scalability and deployability are no longer seen as primary concerns. Data imbalance emerges as a major obstacle to maintaining model correctness and robustness, and practitioners commonly use mitigation strategies such as active learning. We validate our main findings with a survey of 50 practitioners, which shows that most of the findings are widely recognized. These results can help researchers focus on the attributes practitioners value most and avoid improving one attribute at the expense of others that are considered more critical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper claims that AI model quality involves nine key attributes beyond correctness; interviews with 15 practitioners reveal context-dependent prioritization (e.g., efficiency over correctness in real-time settings; scalability/deployability no longer primary), with data imbalance as a major obstacle to correctness/robustness and active learning as a common mitigation; a follow-up survey of 50 practitioners validates that most findings are widely recognized.

Significance. If the empirical claims hold, the work could usefully redirect research attention toward practitioner-valued attributes and trade-offs. The mixed-methods design (interviews plus validation survey) is a strength when methods are transparent.

major comments (3)
  1. [Abstract, §3] Abstract and §3 (Methods): the central generalization claims (context-dependent prioritization, data imbalance as 'major obstacle', active learning as 'commonly used') rest on the untested representativeness of the 15-interviewee convenience sample plus 50-survey respondents. No information is supplied on recruitment method, response rate, stratification by role/company size/domain, or external validation of self-reports against observed practice; this directly undermines the load-bearing assumption identified in the stress-test note.
  2. [Abstract] Abstract: the process for identifying the nine quality attributes is not described (e.g., whether derived from prior literature, pilot interviews, or thematic analysis of the 15 transcripts). Without this, it is impossible to assess whether the attribute set is exhaustive or biased toward the sampled practitioners.
  3. [Abstract] Abstract and validation paragraph: the survey is said to show that 'most of the findings are widely recognized,' yet no quantitative results, response distributions, or statistical tests are referenced; this leaves the validation claim unsupported.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below, agreeing where revisions are needed to improve transparency while defending the exploratory nature of the mixed-methods design.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (Methods): the central generalization claims rest on the untested representativeness of the 15-interviewee convenience sample plus 50-survey respondents. No information is supplied on recruitment method, response rate, stratification by role/company size/domain, or external validation of self-reports against observed practice.

    Authors: We agree the manuscript should provide more methodological transparency. The sample is a convenience sample recruited via professional networks and LinkedIn, which is standard for qualitative SE studies; we do not claim statistical representativeness but present context-specific insights. We will revise §3 to detail recruitment, participant roles/domains, and add a limitations paragraph on generalizability and self-report nature. Response rate is not applicable as it was not a closed survey. revision: yes

  2. Referee: [Abstract] Abstract: the process for identifying the nine quality attributes is not described (e.g., whether derived from prior literature, pilot interviews, or thematic analysis of the 15 transcripts).

    Authors: The nine attributes emerged from thematic analysis of the interview data, cross-referenced with prior literature on software quality attributes (e.g., ISO 25010 extensions for ML). We will revise the abstract and §3 to explicitly describe the identification process, including coding approach and how saturation was assessed. revision: yes

  3. Referee: [Abstract] Abstract and validation paragraph: the survey is said to show that 'most of the findings are widely recognized,' yet no quantitative results, response distributions, or statistical tests are referenced.

    Authors: We agree this claim requires supporting data. The survey used Likert-scale items; we will add response distributions (e.g., % agreement per finding) and any relevant descriptive statistics in the revised validation section. revision: yes

standing simulated objections not resolved
  • External validation of self-reports against observed practice is unavailable given the interview/survey design.

Circularity Check

0 steps flagged

No circularity: empirical interview/survey study with no derivation chain

full rationale

The paper reports practitioner perspectives obtained through 15 interviews and a follow-up survey of 50 respondents. No equations, fitted parameters, predictions, or mathematical derivations appear in the provided text. Claims about attribute prioritization, data imbalance, and mitigation strategies are presented as direct outputs of the collected responses rather than reductions of any prior inputs by construction. Self-citation load-bearing, ansatz smuggling, or renaming of known results are absent. Representativeness of the sample is a validity issue for generalization but does not create circularity in any derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study depends on standard assumptions of qualitative research methods without new free parameters or invented entities.

axioms (1)
  • domain assumption Self-reported data from interviews and surveys accurately captures practitioners' perceptions and practices.
    Central to interpreting the findings as reflective of industry realities.

pith-pipeline@v0.9.0 · 5713 in / 1135 out tokens · 28299 ms · 2026-05-24T04:20:21.016709+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Results-Actionability Gap: Understanding How Practitioners Evaluate LLM Products in the Wild

    cs.SE 2026-01 conditional novelty 7.0

    Qualitative study of 19 practitioners reveals ten LLM product evaluation practices and introduces the results-actionability gap as a key barrier to turning findings into improvements.

Reference graph

Works this paper leans on

144 extracted references · 144 canonical work pages · cited by 1 Pith paper · 12 internal anchors

  1. [1]

    [n. d.]. Apache Ignite. https://ignite.apache.org/

  2. [2]

    [n. d.]. Apache Spark. https://spark.apache.org/

  3. [3]

    [n. d.]. ChatGPT is easily abused, and that’s a big problem. https://adguard.com/en/blog/chatgpt-dan-prompt- abuse.html , Vol. 1, No. 1, Article . Publication date: February 2018. Quality Assurance for Artificial Intelligence: A Study of Industrial Concerns, Challenges and Best Practices 37

  4. [4]

    [n. d.]. Kubernetes. https://kubernetes.io/

  5. [5]

    [n. d.]. NVIDIA CUDA toolkit. https://developer.nvidia.com/cuda-toolkit

  6. [6]

    [n. d.]. NVIDIA TensorRT. https://developer.nvidia.com/tensorrt

  7. [7]

    [n. d.]. NVIDIA Triton Inference Server. https://developer.nvidia.com/nvidia-triton-inference-server

  8. [8]

    [n. d.]. Personal Data Protection Act. https://www.pdpc.gov.sg/Overview-of-PDPA/The-Legislation/Personal-Data- Protection-Act

  9. [9]

    [n. d.]. Pinecone. https://www.pinecone.io/

  10. [10]

    [n. d.]. PyTorch. https://pytorch.org/

  11. [11]

    [n. d.]. Seldon. https://www.seldon.io/

  12. [12]

    [n. d.]. TensorFlow. https://www.tensorflow.org/

  13. [13]

    History of the Basel Committee

    2014. History of the Basel Committee. https://www.bis.org/bcbs/history.htm

  14. [14]

    ISO 9001:2015

    2015. ISO 9001:2015. https://www.iso.org/standard/62085.html

  15. [15]

    General Data Protection Regulation (GDPR)

    2022. General Data Protection Regulation (GDPR). https://gdpr-info.eu/

  16. [16]

    Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318

  17. [17]

    Ibrahim M Ahmed and Manar Younis Kashmoola. 2021. Threats on machine learning technique by data poisoning attack: A survey. In Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, August 24–25, 2021, Revised Selected Papers 3 . Springer, 586–600

  18. [19]

    Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) . 291–300. https://doi.org/10.11...

  19. [20]

    Shin Ando and Chun-Yuan Huang. 2017. Deep Over-sampling Framework for Classifying Imbalanced Data. arXiv:1704.07515 [cs.LG]

  20. [21]

    Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2019. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. arXiv:1910.100...

  21. [22]

    Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, and David Lo. 2022. BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems. IEEE Transactions on Software Engineering 48, 12 (2022), 5087–5101. https://doi.org/10.1109/TSE.2021.3136169

  22. [23]

    Yang Bao, Gilles Hilary, and Bin Ke. 2022. Artificial intelligence and fraud detection. Innovative Technology at the Interface of Finance and Operations: Volume I (2022), 223–247

  23. [24]

    Hollen Barmer, Rachel Dzombak, Matthew Gaston, Vijaykumar Palat, Frank Redner, Tanisha Smith, and John Wohlbier

  24. [25]

    (9 2021)

    Scalable AI. (9 2021). https://doi.org/10.1184/R1/16560273.v1

  25. [26]

    Mohammad Riyaz Belgaum, Zainab Alansari, Shahrulniza Musa, Muhammad Mansoor Alam, and MS Mazliham. 2021. Role of artificial intelligence in cloud computing, IoT and SDN: Reliability and scalability issues. International Journal of Electrical and Computer Engineering 11, 5 (2021), 4458

  26. [27]

    Kartikeya Bhardwaj, Naveen Suda, and Radu Marculescu. 2019. Dream Distillation: A Data-Independent Model Compression Framework. arXiv:1905.07072 [stat.ML]

  27. [28]

    Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D. Sculley. 2017. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction. In Proceedings of IEEE Big Data

  28. [29]

    Lawrence

    Christian Cabrera, Andrei Paleyes, Pierre Thodoroff, and Neil D. Lawrence. 2023. Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective. arXiv:2302.04810 [cs.SE]

  29. [30]

    Longbing Cao. 2021. AI in Finance: Challenges, Techniques and Opportunities. arXiv:2107.09051 [q-fin.CP]

  30. [31]

    Longbing Cao. 2022. AI in Finance: Challenges, Techniques, and Opportunities. ACM Comput. Surv. 55, 3, Article 64 (feb 2022), 38 pages. https://doi.org/10.1145/3502289

  31. [32]

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting Training Data from Large Language Models. arXiv:2012.07805 [cs.CR]

  32. [33]

    N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16 (jun 2002), 321–357. https://doi.org/10.1613/jair.953 , Vol. 1, No. 1, Article . Publication date: February 2018. 38 Chenyu Wang, Zhou Yang, Ze Shi Li, Daniela Damian, and David Lo

  33. [34]

    Karel Crombecq, Luciano De Tommasi, Dirk Gorissen, and Tom Dhaene. 2009. A novel sequential design strategy for global surrogate modeling. In Proceedings of the 2009 Winter Simulation Conference (WSC) . 731–742. https: //doi.org/10.1109/WSC.2009.5429687

  34. [35]

    Cruzes and Tore Dyba

    Daniela S. Cruzes and Tore Dyba. 2011. Recommended Steps for Thematic Synthesis in Software Engineering. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement (ESEM ’11) . IEEE Computer Society, USA, 275–284. https://doi.org/10.1109/ESEM.2011.36

  35. [36]

    Arun Das and Paul Rad. 2020. Opportunities and challenges in explainable artificial intelligence (xai): A survey. arXiv preprint arXiv:2006.11371 (2020)

  36. [37]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]

  37. [38]

    Yuanrui Fan, Xin Xia, David Lo, Ahmed E Hassan, and Shanping Li. 2021. What makes a popular academic AI repository? Empirical Software Engineering 26, 1 (2021), 1–35

  38. [39]

    Michael Felderer and Rudolf Ramler. 2021. Quality Assurance for AI-Based Systems: Overview and Challenges (Introduction to Interactive Session). In Software Quality: Future Perspectives on Software Engineering Quality . Springer International Publishing, 33–42. https://doi.org/10.1007/978-3-030-65854-0_3

  39. [40]

    Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual Event, USA) (ISSTA 2020). Association for Computing Machinery, New York, NY, USA, ...

  40. [41]

    Stefan Feuerriegel, Mateusz Dolata, and Gerhard Schwabe. 2020. Fair AI: Challenges and opportunities. Business & information systems engineering 62 (2020), 379–384

  41. [42]

    Aaron Fisher, Cynthia Rudin, and Francesca Dominici. 2019. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. arXiv:1801.01489 [stat.ME]

  42. [43]

    Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence informa- tion and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. 1322–1333

  43. [44]

    Jerome Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 29 (11 2000). https://doi.org/10.1214/aos/1013203451

  44. [45]

    Shipeng Fu, Zhen Li, Kai Liu, Sadia Din, Muhammad Imran, and Xiaomin Yang. 2020. Model Compression for IoT Applications in Industry 4.0 via Multiscale Knowledge Transfer. IEEE Transactions on Industrial Informatics 16, 9 (2020), 6013–6022. https://doi.org/10.1109/TII.2019.2953106

  45. [46]

    Zhe Fu, Jingyu Yang, Changming Bai, Xiao Chen, Cun Zhang, Yanlin Zhang, and Dongsheng Wang. 2020. Astraea: Deploy AI Services at the Edge in Elegant Ways. In 2020 IEEE International Conference on Edge Computing (EDGE) . 49–53. https://doi.org/10.1109/EDGE50951.2020.00015

  46. [47]

    Amin Ghadesi, Maxime Lamothe, and Heng Li. 2023. What Causes Exceptions in Machine Learning Applications? Mining Machine Learning-Related Stack Traces on Stack Overflow. arXiv:2304.12857 [cs.LG]

  47. [48]

    Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2014. Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. arXiv:1309.6392 [stat.AP]

  48. [49]

    Chen Gong, Zhou Yang, Yunpeng Bai, Jieke Shi, Arunesh Sinha, Bowen Xu, David Lo, Xinwen Hou, and Guoliang Fan

  49. [50]

    In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC ’22)

    Curiosity-Driven and Victim-Aware Adversarial Policies. In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC ’22). Association for Computing Machinery, New York, NY, USA, 186–200. https://doi.org/10.1145/3564625.3564636

  50. [51]

    Generative Adversarial Networks

    Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:1406.2661 [stat.ML]

  51. [52]

    Leo Goodman. 1961. Snowball Sampling. Ann Math Stat 32 (03 1961). https://doi.org/10.1214/aoms/1177705148

  52. [53]

    Serge Gorbunov and Arnold Rosenbloom. 2010. Autofuzz: Automated network protocol fuzzing framework. Ijcsns 10, 8 (2010), 239

  53. [54]

    Waltz, Philip M

    Philip Gross, Albert Boulanger, Marta Arias, David L. Waltz, Philip M. Long, Charles Lawson, Roger Anderson, Matthew Koenig, Mark Mastrocinque, William Fairechio, John A. Johnson, Serena Lee, Frank Doherty, and Arthur Kressner. 2006. Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis. In IAAI. http://www.phi...

  54. [55]

    Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How Many Interviews Are Enough?: An Experiment with Data Saturation and Variability. Field Methods 18, 1 (Feb. 2006), 59–82. https://doi.org/10.1177/1525822X05279903 Publisher: SAGE Publications Inc

  55. [56]

    Michelle Guo, Albert Haque, De-An Huang, Serena Yeung, and Li Fei-Fei. 2018. Dynamic Task Prioritization for Multitask Learning. In Proceedings of the European Conference on Computer Vision (ECCV) . , Vol. 1, No. 1, Article . Publication date: February 2018. Quality Assurance for Artificial Intelligence: A Study of Industrial Concerns, Challenges and Best...

  56. [57]

    Ronan Hamon, Henrik Junklewitz, Ignacio Sanchez, et al. 2020. Robustness and explainability of artificial intelligence. Publications Office of the European Union 207 (2020)

  57. [58]

    Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Proceedings of the 2005 International Conference on Advances in Intelligent Computing - Volume Part I (Hefei, China) (ICIC’05). Springer-Verlag, Berlin, Heidelberg, 878–887. https://doi.org/10.1007/11538059_91

  58. [59]

    Miriam Harris, Amy Qi, Luke Jeagal, Nazi Torabi, Dick Menzies, Alexei Korobitsyn, Madhukar Pai, Ruvandhi R Nathavitharana, and Faiz Ahmad Khan. 2019. A systematic review of the diagnostic accuracy of artificial intelligence- based computer programs to analyze chest x-rays for pulmonary tuberculosis. PloS one 14, 9 (2019), e0221339

  59. [60]

    Mardhiya Hayati, Siti Mutmainah, and Syed Ghufran. 2021. Random and Synthetic Over-Sampling Approach to Resolve Data Imbalance in Classification. International Journal of Artificial Intelligence Research 4 (01 2021), 86. https://doi.org/10.29099/ijair.v4i2.152

  60. [61]

    Zecheng He, Tianwei Zhang, and Ruby B Lee. 2019. Model inversion attacks against collaborative inference. In Proceedings of the 35th Annual Computer Security Applications Conference . 148–162

  61. [62]

    Hearst, S.T

    M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and their Applications 13, 4 (1998), 18–28. https://doi.org/10.1109/5254.708428

  62. [63]

    Lukas Heiland, Marius Hauser, and Justus Bogner. 2023. Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository. arXiv:2303.13173 [cs.SE]

  63. [64]

    Henrik Heymann, Hendrik Mende, Maik Frye, and Robert H. Schmitt. 2023. Assessment Framework for Deployability of Machine Learning Models in Production. Procedia CIRP 118 (2023), 32–37. https://doi.org/10.1016/j.procir.2023.06.007 16th CIRP Conference on Intelligent Computation in Manufacturing Engineering

  64. [65]

    Hans-Martin Heyn, Eric Knauss, Amna Pir Muhammad, Olof Eriksson, Jennifer Linder, Padmini Subbiah, Shameer Ku- mar Pradhan, and Sagar Tungal. 2021. Requirement Engineering Challenges for AI-intense Systems Develop- ment. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (W AIN) . 89–96. https: //doi.org/10.1109/WAIN52551.2021.00020

  65. [66]

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [stat.ML]

  66. [67]

    Carrie Howell, Wei Su, Ariann Nassel, April Agne, and Andrea Cherrington. 2020. Area based stratified random sampling using geospatial technology in a community-based survey. BMC Public Health 20 (11 2020). https: //doi.org/10.1186/s12889-020-09793-0

  67. [68]

    Krystal Hu. 2023. CHATGPT sets record for fastest-growing user base - analyst note. https://www.reuters.com/ technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/

  68. [69]

    Shotaro Ishihara. 2023. Training Data Extraction From Pre-trained Language Models: A Survey. arXiv:2305.16157 [cs.CL]

  69. [70]

    Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. arXiv:1712.05877 [cs.LG]

  70. [71]

    Jean-Marie John-Mathews, Dominique Cardon, and Christine Balagué. 2022. From reality to world. A critical perspective on AI fairness. Journal of Business Ethics 178, 4 (July 2022), 945–959. https://doi.org/10.1007/s10551-022- 05055-8 FNEGE 1, HCERES A, ABS 3

  71. [72]

    Milan Jovic, Andrea Adamoli, and Matthias Hauswirth. 2011. Catch me if you can: performance bug detection in the wild. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications. 155–170

  72. [73]

    Reza karemi and mohammadreza nasiri. 2023. Identifying and Prioritizing Factors Affecting Knowledge Sharing in Software Companies. Sciences and Techniques of Information Management (2023), –. https://doi.org/10.22091/stim. 2023.10146.2043

  73. [74]

    Sanjay Kariyappa and Moinuddin K Qureshi. 2019. Defending Against Model Stealing Attacks with Adaptive Misinformation. arXiv:1911.07100 [stat.ML]

  74. [76]

    Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) . IEEE. https://doi.org/10.1109/icse.2019.00108

  75. [77]

    Segment Anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arXiv:2304.02643 [cs.CV]

  76. [78]

    Pavneet Singh Kochhar, Xin Xia, and David Lo. 2019. Practitioners’ Views on Good Software Testing Practices. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) . , Vol. 1, No. 1, Article . Publication date: February 2018. 40 Chenyu Wang, Zhou Yang, Ze Shi Li, Daniela Damian, and David Lo 61...

  77. [79]

    Taesung Lee, Benjamin Edwards, Ian Molloy, and Dong Su. 2018. Defending Against Machine Learning Model Stealing Attacks Using Deceptive Perturbations. arXiv:1806.00054 [cs.LG]

  78. [80]

    Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2022. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering34, 1 (jan 2022), 50–70. https://doi.org/10.1109/tkde.2020.2981314

  79. [81]

    Liang, Maryam Arab, Minhyuk Ko, Amy J

    Jenny T. Liang, Maryam Arab, Minhyuk Ko, Amy J. Ko, and Thomas D. LaToza. 2023. A Qualitative Study on the Implementation Design Decisions of Developers. arXiv:2301.09789 [cs.SE]

  80. [82]

    Bowen Liu, Boao Xiao, Xutong Jiang, Siyuan Cen, Xin He, Wanchun Dou, and Huaming Chen. 2023. Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT. Sec. and Commun. Netw. 2023 (jan 2023), 10 pages. https://doi.org/10.1155/2023/8691095

Showing first 80 references.