CONSISTENT FILTERING OF MACHINE LEARNING DATA,申请号US201414460314-传众专利搜索

发明名称	CONSISTENT FILTERING OF MACHINE LEARNING DATA
摘要	Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration.
申请公布号	US2015379425(A1)	申请公布日期	2015.12.31
申请号	US201414460314	申请日期	2014.08.14
申请人	Amazon Technologies, Inc.	发明人	DIRAC LEO PARKER;LI JIN;ZHENG TIANMING;ZHUO DONGHUI
分类号	G06N99/00	主分类号	G06N99/00
代理机构		代理人
主权项	1. A system, comprising: one or more computing devices configured to: generate consistency metadata to be used for one or more training-and-evaluation iterations of a machine learning model, wherein the consistency metadata comprises at least a particular initialization parameter value for a pseudo-random number source;sub-divide an address space of a particular data set of the machine learning model into a plurality of chunks, including a first chunk comprising a first plurality of observation records, and a second chunk comprising a second plurality of observation records;retrieve, from one or more persistent storage devices, observation records of the first chunk into a memory of a first server, and observation records of the second chunk into a memory of a second server,select, using a first set of pseudo-random numbers, a first training set from the plurality of chunks, wherein the first training set includes at least a portion of the first chunk, wherein observation records of the first training set are used to train the machine learning model during a first training-and-evaluation iteration of the one or more training-and-evaluation iterations, and wherein the first set of pseudo-random numbers is obtained using the consistency metadata; andselect, using a second set of pseudo-random numbers, a first test set from the plurality of chunks, wherein the first test set includes at least a portion of the second chunk, wherein observation records of the first test set are used to evaluate the machine learning model during the first training-and-evaluation iteration, and wherein the second set of pseudo-random numbers is obtained using the consistency metadata.
地址	Reno NV US