发明名称 Method for screening samples for building prediction model and computer program product thereof
摘要 A method for screening samples for building a prediction model and a computer program product thereof are provided. When a set of new sample data is added to a dynamic moving window (DMW), a clustering step is performed with respect to all of the sets of sample data within the window for grouping the sets of sample data with similar properties as one group. If the number of the sets of sample data in the largest group is greater than a predetermined threshold, it means that there are too many sets of sample data with similar properties in the largest group, and the oldest sample data in the largest group can be deleted; if smaller than or equal to a predetermined threshold, it means that the sample data in the largest group are quite unique, and should be kept for building or refreshing the prediction model.
申请公布号 US8862525(B2) 申请公布日期 2014.10.14
申请号 US201213667039 申请日期 2012.11.02
申请人 National Cheng Kung University 发明人 Cheng Fan-Tien;Wu Wei-Ming
分类号 G06N3/08;G01W1/10 主分类号 G06N3/08
代理机构 McClure, Qualey & Rodack, LLP 代理人 McClure, Qualey & Rodack, LLP
主权项 1. A computer implemented method for screening samples for building a prediction model, comprising: obtaining a plurality of sets of first sample data sequentially generated with respect to a target to be predicted, the sets of first sample data comprising: a plurality of sets of monitored data; anda plurality of objective data, wherein the objective data are corresponding to the sets of monitored data in a one-to-one manner and are cause-and-result related;performing a clustering step with respect to all of the sets of first sample data for grouping the sets of first sample data with high similarities as one group, thereby forming and obtaining a plurality of first groups;searching for at least one of the first groups having the most number of sets of first sample data, thereby obtaining at least one second group;determining if the number of the at least one second group is greater than or equal to 2, thus obtaining a first determination result;searching for one of the at least one second group having the oldest set of first sample data when the first determination result is yes, thereby obtaining a third group; and determining if the number of sets of first data in the third group is smaller than a predetermined number, thus obtaining a second determination result;determining if the number of sets of first data in the second group is smaller than the predetermined number when the first determination result is no, thus obtaining a third determination result;reserving all of the sets of first sample data for building or refreshing the prediction model when the second determination result or the third determination result is yes, wherein the prediction model is used for predicting a status or behavior of the target;discarding the oldest set of first sample data in the third group and reserving the remaining sets of first sample data for building or refreshing the prediction model when the second determination result is no; anddiscarding the oldest set of first sample data in the second group and reserving the remaining sets of first sample data for building or refreshing the prediction model when the third determination result is no.
地址 Tainan TW