Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.
Semi-supervised fuzzy c-means clustering of biological data
Ceccarelli M;
2006-01-01
Abstract
Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.