发明名称 Extracting predictive segments from sampled data
摘要 A system and method is disclosed which predicts the relative occurrence or presence of an event or item based on sample data consisting of samples which contain and samples which do not contain the event or item. The samples also consist of any number of descriptive attributes, which may be continuous variables, binary variables, or categorical variables. Given the sampled data, the system automatically creates statistically optimal segments from which a functional input/output relationship can be derived. These segments can either be used directly in the form of a lookup table or in some cases as input data to a secondary modeling system such as a linear regression module, a neural network, or other predictive system.
申请公布号 US9147159(B2) 申请公布日期 2015.09.29
申请号 US201213731075 申请日期 2012.12.30
申请人 CERTONA CORPORATION 发明人 Hueter Geoffrey J.;Farber Benjamin S.
分类号 G06N5/02;G06Q10/04;G06Q30/02 主分类号 G06N5/02
代理机构 代理人 Clarke Richard D.
主权项 1. A computer implemented web-based predictive modeling method to extract predictive segments from sampled data used for predicting subject response, comprising the steps of: providing segmentation attributes and sampled data; and analyzing the distribution of sampled data; wherein said analysis of the distribution of sampled data comprises the steps of: ordering the transactions and occurrences by dimension and treating each dimension one at a time and independently; creating a cumulative sequence by adding P to the previous value when the next transaction contains the item of interest and subtracting A when the item is not present, such that P=1/NP, where NP is the total number of transactions containing the item of interest, and A=1/NA, where NA is the total number of transactions not containing the item of interest, and where the total number of transactions is Ntotal=NP+NA; determining the sequence of maximum relative probability of the item wherein the max and the min correspond to the candidate partition points of the dimension; partitioning the dimension using the point furthest from the edge of the domain of the dimension in sample order; and calculating the density factor d=r/s, whereby r=(number of items of interest in peak sequence) and s=(number of all items in peak sequence) and d is a number between 0 and 1; whereby the density factor is considered significant if R=(r−ravg)/√{square root over (r)}≧T, where ravg=s·NP/Ntotal and T is a predetermined user specified significance threshold.
地址 San Diego CA US