Machine Learning (ML) is increasingly being used as an essential component of modern software systems. Also, the maturity of the adopted techniques and the availability of frameworks have changed the way developers approach ML-related development problems. This paper aims at investigating, by analyzing Stack Overflow (SO) posts related to ML, how the questions about ML have been changing over the years, and across six different programming languages. We analyzed 43,950 SO posts in the period 2008-2020, studying (i) how the number of ML-related posts changes over time for each programming language, (ii) how the posts are distributed across different phases of a ML pipeline, and (iii) whether posts belonging to different languages or phases are more or less challenging to address. We found that some programming languages are fading while others are becoming more popular in ML development. While model-building questions are the most discussed in general, the level of challenges posed by the other phases of the ML pipeline appears to be language-dependent. Results of this work could be used to better understand ML challenges in different programming languages, and, possibly, to improve ML tutorials related to different languages.
Towards Understanding Developers' Machine-Learning Challenges: A Multi-Language Study on Stack Overflow
Antoniol G.;Di Penta M.;
2021-01-01
Abstract
Machine Learning (ML) is increasingly being used as an essential component of modern software systems. Also, the maturity of the adopted techniques and the availability of frameworks have changed the way developers approach ML-related development problems. This paper aims at investigating, by analyzing Stack Overflow (SO) posts related to ML, how the questions about ML have been changing over the years, and across six different programming languages. We analyzed 43,950 SO posts in the period 2008-2020, studying (i) how the number of ML-related posts changes over time for each programming language, (ii) how the posts are distributed across different phases of a ML pipeline, and (iii) whether posts belonging to different languages or phases are more or less challenging to address. We found that some programming languages are fading while others are becoming more popular in ML development. While model-building questions are the most discussed in general, the level of challenges posed by the other phases of the ML pipeline appears to be language-dependent. Results of this work could be used to better understand ML challenges in different programming languages, and, possibly, to improve ML tutorials related to different languages.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.