摘要 |
Described is a system for assessing the quality of machine learning algorithms over massive time series. A set of random blocks of a time series data sample of size n is selected in parallel. Then, r resamples are generated, in parallel, by applying a bootstrapping method to each block in the set of random blocks to obtain a resample of size n, where r is not fixed. Errors are estimated on the r resamples, and a final accuracy estimate is produced by averaging the errors estimated on the r resamples. |
主权项 |
1. A system for assessing the quality of machine learning algorithms over time series, the system comprising:
one or more processors and a non-transitory memory having instructions encoded thereon such that when the instructions are executed, the one or more processors perform operations of: selecting, in parallel, a set of random blocks of a time series data sample, wherein the time series data sample comprises a plurality of data points X1, . . . , Xn, wherein n is the number of data points in the time series data sample, and wherein the time series data sample is a sample of a much larger time series dataset; generating, in parallel, a set of resamples by applying a bootstrapping method to each block in the set of random blocks to obtain a resample for each block, wherein the number of resamples in the set of resamples is not fixed; determining errors on the set of resamples, wherein the errors represent variation within the set of resamples; producing a final accuracy estimate by averaging the errors estimated on the set of resamples, wherein the final accuracy estimate is an estimate of how accurately the set of random blocks represents the time series data sample; and using the final accuracy estimate to assess a quality of at least one machine learning algorithm over the time series dataset. |