发明名称 DATA MINING OF VERY LARGE SPATIAL DATASET
摘要 A system and method for performing and accelerating cluster analysis of large data sets is presented. The data set is formatted into binary bit Sequential (bSQ) format an d then structured into a Peano Count tree (P-tree) format which represents a lossless tree representation of the original data. A P-tree algebra is defined and used to formulate a vertical set inner product (VSIP) technique that can be used to efficiently and scalably measure the mean value and total variation of a set about a fixed point in the large dataset. The set can be any projected subspace of any vector space, including oblique sub spaces. The VSIPs are used to determine the closeness of a point to a set of points in the large dataset making the VSIPs very useful in classification, clustering and outlier detection. One advantage is that the number of centroids (k) need not be pre-specified but are effectively determined. The high quality of the centroids makes them useful in partitioning clustering methods such as the k-means and the k-medoids clustering. The present invention also identifies the outliers.
申请公布号 WO2006055894(A3) 申请公布日期 2006.11.23
申请号 WO2005US42101 申请日期 2005.11.17
申请人 NORTH DAKOTA STATE UNIVERSITY;PERRIZO, WILLIAM, K.;ABIDIN, TAUFIK, FUADI;PERERA, AMAL, SHEHAN;SERAZI, MD., MASUM 发明人 PERRIZO, WILLIAM, K.;ABIDIN, TAUFIK, FUADI;PERERA, AMAL, SHEHAN;SERAZI, MD., MASUM
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址