DNA microarray analysis represents a relevant technology in genetic research to explore and recognize possible genomic features of many diseases. Since it is a high-throughput technology, it requires advanced tools for a dimensional reduction in massive data sets. Clustering is among the most appropriate tools for mining these data, although it suffers from the following problems: instability of the results, large number of genes compared with the number of samples, high noise level, complexity of initialization, and grouping genes and samples simultaneously. Almost all these problems can be positively addressed by using novel techniques, such as biclustering. In this paper, a new biclustering algorithm is proposed, hereafter denoted as combinatorial biclustering algorithm (CBA), that addresses the problems listed above. The algorithm analyzes the data finding biclusters of the desired size and allowable error. CBA performances are compared with the ones of other bicluster algorithms by discussing the output of different methods once running them on a synthetic data set. CBA seems to perform better, and for this reason, it has been applied to study a real data set as well. In particular, CBA has analyzed the transcriptional profile of 38 gastric cancer tissues with microsatellite instability (MSI) and without MSS. The results show clearly a much coherent behavior in gene expression of normal tissues versus tumoral ones. The high level of gene misregulation in tumoral tissues affects any further bicluster analysis, and it is only partially smoothed in the MSI/MSS study even admitting much higher level on initial admissible error. © 2012 Springer-Verlag London Limited.
An improved combinatorial biclustering algorithm
Napolitano F.;
2013-01-01
Abstract
DNA microarray analysis represents a relevant technology in genetic research to explore and recognize possible genomic features of many diseases. Since it is a high-throughput technology, it requires advanced tools for a dimensional reduction in massive data sets. Clustering is among the most appropriate tools for mining these data, although it suffers from the following problems: instability of the results, large number of genes compared with the number of samples, high noise level, complexity of initialization, and grouping genes and samples simultaneously. Almost all these problems can be positively addressed by using novel techniques, such as biclustering. In this paper, a new biclustering algorithm is proposed, hereafter denoted as combinatorial biclustering algorithm (CBA), that addresses the problems listed above. The algorithm analyzes the data finding biclusters of the desired size and allowable error. CBA performances are compared with the ones of other bicluster algorithms by discussing the output of different methods once running them on a synthetic data set. CBA seems to perform better, and for this reason, it has been applied to study a real data set as well. In particular, CBA has analyzed the transcriptional profile of 38 gastric cancer tissues with microsatellite instability (MSI) and without MSS. The results show clearly a much coherent behavior in gene expression of normal tissues versus tumoral ones. The high level of gene misregulation in tumoral tissues affects any further bicluster analysis, and it is only partially smoothed in the MSI/MSS study even admitting much higher level on initial admissible error. © 2012 Springer-Verlag London Limited.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.