摘要 |
Embodiments associated with configurable, repeatable, data generation are described. One example method includes manipulating a redundancy parameter that controls data redundancy in binary large objects (BLOBs) to be included in a generated data set. The redundancy parameters may control variations in repeatable variable length sequences included in BLOBs. The example method also includes manipulating a parameter(s) that controls custom designed sequences included in BLOBs. With the redundancy and custom designed sequences described, the example method then generates BLOBs based, at least in part, on the redundancy parameters and the custom-designed sequences. BLOBs may include byte sequences repeated at different frequencies and configurable user-designed sequences. Manipulating the redundancy parameter, manipulating the custom-designed sequences, generating the BLOBs, and providing the BLOBS may be performed by separate processes acting in parallel. |
主权项 |
1. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer cause the computer to perform a method, the method comprising:
manipulating one or more redundancy parameters that control redundancy in data to be generated, where manipulating the one or more redundancy parameters includes manipulating a degree of internal redundancy for a subset of the data, a degree of external redundancy between subsets of the data, a frequency with which internal redundancy is to vary, or a frequency with which external redundancy is to vary; manipulating one or more parameters that control custom-designed sequences to be included in the data, where manipulating the one or more parameters that control custom-designed sequences includes manipulating a sequence length distribution, where manipulating the sequence length distribution follows a kurtosis rule, where the kurtosis rule defines the sequence length distribution to follow a geometric frequency distribution; generating the data based, at least in part, on the one or more redundancy parameters, where the data includes one or more variable custom-designed sequences, and where the data comprises one or more binary large objects exhibiting byte-sequence variability with binary large object dispersion, where the data include redundant spans that are specified as random seed generated variable length patterns from within a constrained number-space, and where the redundant spans are controlled, at least in part, by the redundancy parameters; and providing the data from the computer to a data de-duplicator, where manipulating the one or more redundancy parameters, manipulating the one or more parameters that control custom-designed sequences, generating the data, and providing the data are performed at least partially in parallel on the computer. |