A significant amount of source code in software systems is made up of comments, parts of the code that are ignored by the compiler. Comments in the code are a primary source for system documentation. These are crucial for the work of software maintainers, as a basis for code traceability, for maintenance activities, but also for the use of the code itself as a library or framework in other projects. Although many software developers consider comments important, existing approaches to software quality analysis mainly disregard code comments and focus only on source code. This paper presents an approach, based on topic modeling, for analyzing the comments consistency to the source code. A model was provided to analyze the quality of comments in terms of consistency since comments should be consistent with the source code they refer to. The results show a similarity in the trend of topic distribution and it emerges that almost all classes are associated with no more than 3 topics.

A Topic Modeling Approach to Evaluate the Comments Consistency to Source Code

Iammarino M.;Aversano L.;Bernardi M. L.;
2020-01-01

Abstract

A significant amount of source code in software systems is made up of comments, parts of the code that are ignored by the compiler. Comments in the code are a primary source for system documentation. These are crucial for the work of software maintainers, as a basis for code traceability, for maintenance activities, but also for the use of the code itself as a library or framework in other projects. Although many software developers consider comments important, existing approaches to software quality analysis mainly disregard code comments and focus only on source code. This paper presents an approach, based on topic modeling, for analyzing the comments consistency to the source code. A model was provided to analyze the quality of comments in terms of consistency since comments should be consistent with the source code they refer to. The results show a similarity in the trend of topic distribution and it emerges that almost all classes are associated with no more than 3 topics.
2020
978-1-7281-6926-2
comment
Natural Language Toolkit
topic modeling
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/60261
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact