Many indexes have been proposed in literature for the comparison of two crisp data partitions, as resulting from two different classifications attempts, two different clustering solutions or the comparison of a predicted vs. a true labeling. Most of these indexes implementations have a computational cost of O(N 2) (where N is the number of data points) and this fact may limit their usage in very big datasets or their integration in computational-intensive validation strategies. Furthermore, their extension to fuzzy partitions is not obvious. In this paper we analyze efficient algorithms to compute many classical indexes (most notably the Jaccard coefficient and the Rand index) in O(d 2∈+∈N) (where d is the number of different classes/clusters) and propose a straightforward procedure to extend their computation to fuzzy partitions. The fuzzy extension is based on a pseudo-count concept and provides a natural framework for including memberships in computation of binary similarity indexes, not limited to the ones here revised. Results on simulated data using the Jaccard coefficient highlight a higher consistence of its proposed fuzzy extension with respect to its crisp counterpart.

A fuzzy extension of some classical concordance measures and an efficient algorithm for their computation

CECCARELLI M;
2008-01-01

Abstract

Many indexes have been proposed in literature for the comparison of two crisp data partitions, as resulting from two different classifications attempts, two different clustering solutions or the comparison of a predicted vs. a true labeling. Most of these indexes implementations have a computational cost of O(N 2) (where N is the number of data points) and this fact may limit their usage in very big datasets or their integration in computational-intensive validation strategies. Furthermore, their extension to fuzzy partitions is not obvious. In this paper we analyze efficient algorithms to compute many classical indexes (most notably the Jaccard coefficient and the Rand index) in O(d 2∈+∈N) (where d is the number of different classes/clusters) and propose a straightforward procedure to extend their computation to fuzzy partitions. The fuzzy extension is based on a pseudo-count concept and provides a natural framework for including memberships in computation of binary similarity indexes, not limited to the ones here revised. Results on simulated data using the Jaccard coefficient highlight a higher consistence of its proposed fuzzy extension with respect to its crisp counterpart.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/2252
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 7
social impact