发明名称 System and method for determining internal parameters of a data clustering program
摘要 A system and associated method for tuning a data clustering program to a clustering task, determine at least one internal parameter of a data clustering program. The determination of one or more of the internal parameters of the data clustering program occurs before the clustering begins. Consequently, clustering does not need to be performed iteratively, thus improving clustering program performance in terms of the required processing time and processing resources. The system provides pairs of data records; the user indicates whether or not these data records should belong to the same cluster. The similarity values of the records of the selected pairs are calculated based on the default parameters of the clustering program. From the resulting similarity values, an optimal similarity threshold is determined. When the optimization criterion does not yield a single optimal similarity threshold range, equivalent candidate ranges are selected. To select one of the candidate ranges, pairs of data records having a calculated similarity value within the critical region are offered to the user.
申请公布号 US2003204484(A1) 申请公布日期 2003.10.30
申请号 US20030390132 申请日期 2003.03.14
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 CHARPIOT BORIS;HARTEL BARBARA;LINGENFELDER CHRISTOPH;MAIER THILO
分类号 G06F7/00;G06F17/30;(IPC1-7):G06F7/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址