Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.

Semi-supervised fuzzy c-means clustering of biological data

Ceccarelli M;
2006

Abstract

Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/20.500.12070/7236
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact