Abstract

There have been more than 50 type clustering algorithms developed for getting meaningful information from big datasets and grouping individuals according to their characteristics. In actual researches, it is often seen that data involves all types of variables. In this case, it is very important to select appropriate clustering algorithm according to different data types. In this study, we will provide information about EM(Expectation Maximization),Two–Step Clustering methods which are developed in recent years and one of the best methods for data sets containing mixed types of variables. And the second aim is to compare the methods by producing a data set from health field information.These algorithms are generally recommended for large data sets but there are also used n medium-sized data sets. Medium- sized data sets are more often in actual researches.Therefore, fifty people for control group and fifty people for patients that have polycystic over syndrome were taken to the study. Totally nineteen variables were measured from these subjects and thirteen of them were quantitative, six of them were qualitative.Clusters were obtained by EM and Two-Step cluster methods.To evaluate the relationships between the clusters obtained from algorithms and actually known patient, control groups were analyzed by Kappa coefficient. It was found that EM clustering algorithm has highest compliance coefficient comparing with Two-Step cluster(Kappa=0,740;p<0,001) and it was seen EM method was a better algorithm for finding both patients and controls. As a result, we can say that researchers may have successful results for classifying diseases by appropriate clustering methods>

Downloads

Download data is not yet available.