摘要 |
The invention refers to a method known as k-means, which includes innovations intended to customize and homogenize the data mining, such as: the user identifies the clustering attributes and establishes the preference degree (i.e., for demanding a higher degree of vicinity between the cluster group; or provide more freedom); the user establishes the convergence threshold (i.e., percentage of satisfaction in the reassignment of members to the clusters); the method provides the initial values for characterizing the centroids in symmetric regions with the same ratio (i.e., the minimum value and the maximum value are symmetric with regard to the centroid); the method estimates the Euclidian distance in a standard and balanced manner, since the accumulation of differences between the centroids and attributes with values represented by heterogeneous units (i.e., hundredths, millions...) is changed by distance percentages (i.e., the difference between the centroid and the value of an attribu te is divided amongst the centroid and the whole values existing in the information repository of that attribute) and these percentages being magnified or degraded in a portion equivalent to the preference assigned to the attribute (i.e., the higher is the relevance of the attribute, the more the percentage ratio grows, for instance a value upper than 1.0:1.1, 1.25); the less is the relevance, the percent valueá proportionally decreases (i.e., a value lower than 1.0: 0.9, 0.75); the method ends the mining upon satisfying the threshold defined by the user, (i.e., avoiding the treatment of the members that were recently assigned to a new cluster). |