A New Approach to Cohesion Measurement: Region-Based Clustering Validation

Authors

  • Sakar Salar Salih Department of Software Engineering and Informatics, College of Engineering, Salahaddin University- Erbil, Kurdistan Region, Iraq
  • Polla Fattah Department of Software Engineering and Informatics, College of Engineering, Salahaddin University- Erbil, Kurdistan Region, Iraq

DOI:

https://doi.org/10.21271/ZJPAS.36.1.5

Keywords:

Clustering, clustering validation index, internal validation index, region-based clustering validation index, clustering cohesion.

Abstract

Clustering assigns objects to clusters based on similarity, aiming to ensure that objects within the same cluster are similar and those in different clusters are dissimilar. Evaluating clustering quality is crucial and challenging. Thus, researchers have proposed clustering validation indices namely internal and external validation indices. Internal indices assess clustering quality using intrinsic information within a dataset. We focus on internal validation indices for their real-world applicability. In this paper, we have proposed a novel region-based internal validation (RCV) index. Our index incorporates the division of each cluster into three distinct regions which are the inner, middle, and outer regions. according to the clusters' center and their corresponding radius, we split each cluster into the aforementioned regions. The average distance is then computed for each region, and a penalty factor is applied to these average distances. By summing up the three penalized average distances, a Region Cluster Validation (RCV) score is obtained for each cluster. The RCV scores for all clusters are then summed together to yield an overall measure of cluster validity. A lower index value indicates better clustering quality. Experiment results on the synthetic and real-world datasets exhibit the usability and effectiveness RCV index.   

References

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M. & Perona, I. 2013. An extensive comparative study of cluster validity indices. Pattern recognition, 46, 243-256.

Baker, F. B. & Hubert, L. J. 1975. Measuring the power of hierarchical cluster analysis. Journal of the American Statistical Association, 70, 31-38.

Bandyopadhyay, S. & Saha, S. 2008. A point symmetry-based clustering technique for automatic evolution of clusters. IEEE Transactions on Knowledge and Data Engineering, 20, 1441-1457.

Bishop, C. 2006. Pattern recognition and machine learning. Springer google schola, 2, 5-43.

Caliński, T. & Harabasz, J. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3, 1-27.

Chou, C.-H., Su, M.-C. & Lai, E. 2004. A new cluster validity measure and its application to image compression. Pattern Analysis and Applications, 7, 205-220.

Clarke, M. 1974. Pattern classification and scene analysis. Wiley Online Library.

Deborah, L. J., Baskaran, R. & Kannan, A. 2010. A survey on internal validity measure for cluster validation. International Journal of Computer Science & Engineering Survey, 1, 85-102.

Dunn, J. C. 1974. Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 4, 95-104.

Fu, L. & Wu, S. 2016. An internal clustering validation index for Boolean data. Cybernetics and Information Technologies, 16, 232-244.

Guo, G., Chen, L., Ye, Y. & Jiang, Q. 2016. Cluster validation method for determining the number of clusters in categorical sequences. IEEE transactions on neural networks and learning systems, 28, 2936-2948.

Gurrutxaga, I., Albisua, I., Arbelaitz, O., Martín, J. I., Muguerza, J., Pérez, J. M. & Perona, I. 2010. SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recognition, 43, 3364-3373.

Halkidi, M., Batistakis, Y. & Vazirgiannis, M. 2001. On clustering validation techniques. Journal of intelligent information systems, 17, 107-145.

Han, J., Pei, J. & Tong, H. 2022. Data mining: concepts and techniques, Morgan kaufmann.

Hennig, C. 2015. What are the true clusters? Pattern Recognition Letters, 64, 53-62.

Jain, A. K. & Dubes, R. C. 1988. Algorithms for clustering data, Prentice-Hall, Inc.

Jain, A. K., Murty, M. N. & Flynn, P. J. 1999. Data clustering: a review. ACM computing surveys (CSUR), 31, 264-323.

Jauhiainen, S. & Kärkkäinen, T. A simple cluster validation index with maximal coverage. European symposium on artificial neural networks, computational intelligence and machine learning, 2017. ESANN.

Kim, M. & Ramakrishna, R. 2005. New indices for cluster validity assessment. Pattern Recognition Letters, 26, 2353-2363.

Lago-Fernández, L. F. & Corbacho, F. 2010. Normality-based validation for crisp clustering. Pattern Recognition, 43, 782-795.

Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J. & Wu, S. 2013. Understanding and enhancement of internal clustering validation measures. IEEE transactions on cybernetics, 43, 982-994.

Misuraca, M., Spano, M. & Balbi, S. 2019. BMS: An improved Dunn index for Document Clustering validation. Communications in statistics-theory and methods, 48, 5036-5049.

Ncir, C.-E. B., Hamza, A. & Bouaguel, W. 2021. Parallel and scalable Dunn Index for the validation of big data clusters. Parallel Computing, 102, 102751.

Rousseeuw, P. J. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65.

Saitta, S., Raphael, B. & Smith, I. F. A bounded index for cluster validity. Machine Learning and Data Mining in Pattern Recognition: 5th International Conference, MLDM 2007, Leipzig, Germany, July 18-20, 2007. Proceedings 5, 2007. Springer, 174-187.

Sharma, S. 1995. Applied multivariate techniques, John Wiley & Sons, Inc.

Thorndike, R. L. 1953. Who belongs in the family? Psychometrika, 18, 267-276.

Wang, X. & Xu, Y. An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Materials Science and Engineering, 2019. IOP Publishing, 052024.

Xie, X. L. & Beni, G. 1991. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, 13, 841-847.

Žalik, K. R. & Žalik, B. 2011. Validity index for clusters of different sizes and densities. Pattern Recognition Letters, 32, 221-234.

Zhao, Y. & Karypis, G. 2004. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine learning, 55, 311-331.

Published

2024-02-05

How to Cite

Sakar Salar Salih, & Polla Fattah. (2024). A New Approach to Cohesion Measurement: Region-Based Clustering Validation. Zanco Journal of Pure and Applied Sciences, 36(1), 49–62. https://doi.org/10.21271/ZJPAS.36.1.5

Issue

Section

Engineering and Computer Sciences