摘要 The essence of the present embodiment is to have genetic algorithm determines the optimal or near optimal variables composition of multivariate models by an evolutionary process. Subjected to any data, the present embodiment automatically identifies the relevant variables and constructs the most effective combination of the said relevant variables to achieve one or more objectives. The objective could be for high explanatory power, or high predictive power, or response measure, or other objectives that the user defined within the fitness function. Since the present embodiment will automatically construct the variables composition when subjected to any data to satisfy the objective(s) defined in the fitness function, it is in effect, an adaptive multivariate model building methodology. In other words, the present embodiment's algorithm also solves the need for dynamic capability in multivariate models since its adaptive evolutionary nature will automatically detect and incorporate the most relevant variables. By employing genetic algorithm, the present embodiment can find global optima or near global optima of appropriate variables combination in an acceptable amount of time and resources that are much less than using full variables permutation methodology. The present embodiment, by utilizing genetic algorithm, solves the sequential F-test problem by conducting non- sequential and non-linear search. This is accomplished by allowing at least one or a plurality of genes mutation in random order, hence including or excluding at least one variable in random order. The algorithm solves partial F-test dilemma by the fact that the variable test procedure evaluates the whole genome (i.e. all candidate variables intact) maintaining fidelity of full variables membership test throughout its permutation. Furthermore, the stochastic nature of genetic algorithm neutralizes the prejudices of manual decisions in variables identification, unlike its human modeler counterpart. Variables membership by the algorithm is based on "survival of the fittest", in which the best genomes (i.e. best combination of variables) will emerge after as the optimal solution among the best of breed. In short, genetic algorithm determines the optimal and a pool of close to optimal multivariate models. The algorithm operates by modifying the parameters of the multivariate models by which it is in fact changing the composition of appropriate variables in the models. Genome population is generated iteratively by the algorithm until the criteria(s) of the desired genome that represent the desired multivariate model has been found within allowable time and reasonable resources.
申请公布号 WO2008112469(A1) 申请公布日期 2008.09.18
申请号 WO2008US55873 申请日期 2008.03.05
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人