The so-called Relevance Index (RI) metrics are a set of recently-introduced indicators based on information theory principles that can be used to analyze complex systems by detecting the main interacting structures within them. Such structures can be described as subsets of the variables which describe the system status that are strongly statistically correlated with one another and mostly independent of the rest of the system. The goal of the work described in this paper is to apply the same principles to pattern recognition and check whether the RI metrics can also identify, in a high-dimensional feature space, attribute subsets from which it is possible to build new features which can be effectively used for classification. Preliminary results indicating that this is possible have been obtained using the RI metrics in a supervised way, i.e., by separately applying such metrics to homogeneous datasets comprising data instances which all belong to the same class, and iterating the procedure for all possible classes taken into consideration. In this work, we checked whether this would also be possible in a totally unsupervised way, i.e., by considering all data available at the same time, independently of the class to which they belong, under the hypothesis that the peculiarities of the variable sets that the RI metrics can identify correspond to the peculiarities by which data belonging to a certain class are distinguishable from data belonging to different classes. The results we obtained in experiments made with some publicly available real-world datasets show that, especially when coupled to tree-based classifiers, the performance of a RI metrics-based unsupervised feature extraction method can be comparable to or better than other classical supervised or unsupervised feature selection or extraction methods.

From complex system analysis to pattern recognition: experimental assessment of an unsupervised feature extraction method based on the Relevance Index metrics

Riccardo Pecori
;
2019-01-01

Abstract

The so-called Relevance Index (RI) metrics are a set of recently-introduced indicators based on information theory principles that can be used to analyze complex systems by detecting the main interacting structures within them. Such structures can be described as subsets of the variables which describe the system status that are strongly statistically correlated with one another and mostly independent of the rest of the system. The goal of the work described in this paper is to apply the same principles to pattern recognition and check whether the RI metrics can also identify, in a high-dimensional feature space, attribute subsets from which it is possible to build new features which can be effectively used for classification. Preliminary results indicating that this is possible have been obtained using the RI metrics in a supervised way, i.e., by separately applying such metrics to homogeneous datasets comprising data instances which all belong to the same class, and iterating the procedure for all possible classes taken into consideration. In this work, we checked whether this would also be possible in a totally unsupervised way, i.e., by considering all data available at the same time, independently of the class to which they belong, under the hypothesis that the peculiarities of the variable sets that the RI metrics can identify correspond to the peculiarities by which data belonging to a certain class are distinguishable from data belonging to different classes. The results we obtained in experiments made with some publicly available real-world datasets show that, especially when coupled to tree-based classifiers, the performance of a RI metrics-based unsupervised feature extraction method can be comparable to or better than other classical supervised or unsupervised feature selection or extraction methods.
2019
feature extraction; information theory; relevance index; unsupervised learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/43847
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact