发明名称 GENERAL FRAMEWORK FOR CROSS-VALIDATION OF MACHINE LEARNING ALGORITHMS USING SQL ON DISTRIBUTED SYSTEMS
摘要 A general framework for cross-validation of any supervised learning algorithm on a distributed database comprises a multi-layer software architecture that implements training, prediction and metric functions in a C++ layer and iterates processing of different subsets of a data set with a plurality of different models in a Python layer. The best model is determined to be the one with the smallest average prediction error across all database segments.
申请公布号 US2016092794(A1) 申请公布日期 2016.03.31
申请号 US201514963061 申请日期 2015.12.08
申请人 EMC Corporation 发明人 Qian Hai;Iyer Rahul;Yang Shengwen;Welton Caleb E.
分类号 G06N99/00 主分类号 G06N99/00
代理机构 代理人
主权项 1. A method of cross-validation of a supervised machine learning algorithm within a distributed database having a plurality of database segments in which data are stored, comprising: partitioning a data set within said database into a training subset and a validation subset, wherein the partitioning data set comprises partitioning the data set according to randomly sorted data to create two data subsets that are independent and statistically equivalent; determining coefficients of a first model of said supervised machine learning algorithm using the training subset; predicting a value of a data element in said validation subset using said first model; determining a prediction error based at least in part on a difference between said predicted value and the actual value of said data element; successively repeating said partitioning k times to form k different partitions, wherein at least a subset of the k different partitions have different training and validation subsets; determining corresponding k prediction errors based at least in part on iteratively determining the coefficients, predicting the value of the data element, and determining the prediction error for each of said k partitions; and evaluating the performance of said first model using said k prediction errors.
地址 Hopkinton MA US