Communication means, such as issue trackers, mailing lists, QA forums, and app reviews, are premier means of collaboration among developers, and between developers and end-users. Analyzing such sources of information is crucial to build recommenders for developers, for example suggesting experts, re-documenting source code, or transforming user feedback in maintenance and evolution strategies for developers. To ease this analysis, in previous work we proposed Development Emails Content Analyzer (DECA), a tool based on Natural Language Parsing that classifies with high precision development emails' fragments according to their purpose. However, DECA has to be trained through a manual tagging of relevant patterns, which is often effort-intensive, error-prone and requires specific expertise in natural language parsing. In this paper, we first show, with an empirical study, the extent to which producing rules for identifying such patterns requires effort, depending on the nature and complexity of patterns. Then, we propose an approach, named Nlp-based softwarE dOcumentation aNalyzer (NEON), that automatically mines such rules, minimizing the manual effort. We assess the performances of NEON in the analysis and classification of mobile app reviews, developers discussions, and issues. NEON simplifies the patterns identification and rules definition processes, allowing a savings of more than 70 percent of the time otherwise spent on performing such activities manually. Results also show that NEON-generated rules are close to the manually identified ones, achieving comparable recall.

Exploiting Natural Language Structures in Software Informal Documentation

Di Sorbo, Andrea;Visaggio, Corrado Aaron;Di Penta, Massimiliano;Canfora, Gerardo;
2021-01-01

Abstract

Communication means, such as issue trackers, mailing lists, QA forums, and app reviews, are premier means of collaboration among developers, and between developers and end-users. Analyzing such sources of information is crucial to build recommenders for developers, for example suggesting experts, re-documenting source code, or transforming user feedback in maintenance and evolution strategies for developers. To ease this analysis, in previous work we proposed Development Emails Content Analyzer (DECA), a tool based on Natural Language Parsing that classifies with high precision development emails' fragments according to their purpose. However, DECA has to be trained through a manual tagging of relevant patterns, which is often effort-intensive, error-prone and requires specific expertise in natural language parsing. In this paper, we first show, with an empirical study, the extent to which producing rules for identifying such patterns requires effort, depending on the nature and complexity of patterns. Then, we propose an approach, named Nlp-based softwarE dOcumentation aNalyzer (NEON), that automatically mines such rules, minimizing the manual effort. We assess the performances of NEON in the analysis and classification of mobile app reviews, developers discussions, and issues. NEON simplifies the patterns identification and rules definition processes, allowing a savings of more than 70 percent of the time otherwise spent on performing such activities manually. Results also show that NEON-generated rules are close to the manually identified ones, achieving comparable recall.
2021
empirical study, Mining unstructured data, natural language processing
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/45397
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 9
social impact