发明名称 SYSTEMS FOR PARALLEL PROCESSING OF DATASETS WITH DYNAMIC SKEW COMPENSATION
摘要 Systems and methods are provided for parallel processing of datasets with dynamic skew compensation. The disclosed systems and methods may increase the efficiency of dataset processing by imposing maximum size limits on parallel processing environment tasks. The disclosed systems and methods may generate a target partition of a variable, a database storing data elements, a cluster that generates one or more output files based on the target partition and the data elements, and a display device that displays analysis results for the target partition using the one or more output files. Generation may comprise creating a calculation partition, mapping data elements according to the calculation partition, and generating the one or more output files based on the mapped data elements. The calculation partition may depend on a target partition and a uniform partition that partitions values based on one or more of statistical measures and pseudorandom functions.
申请公布号 US2017083384(A1) 申请公布日期 2017.03.23
申请号 US201615271937 申请日期 2016.09.21
申请人 CAPITAL ONE SERVICES, LLC 发明人 STOCKER John;KUMAR Sunny
分类号 G06F9/52;G06F9/50 主分类号 G06F9/52
代理机构 代理人
主权项 1. A cluster for parallel processing of datasets with dynamic skew compensation, comprising: at least one first worker node, each first worker node comprising a processor and a storage medium comprising instructions that cause the processor to process data elements received from a datasource into intermediate data associated with a calculation partition determined by a first function of (i) a first target partition of a first variable and (ii) a first uniform partition of the first variable, the first uniform partition dividing first variable values into similarly sized groups; and at least one second worker node, each second worker node comprising a processor and a storage medium comprising instructions that cause the processor to receive the intermediate data and generate output data for provision to a display device or for subsequent processing, the output data based on the intermediate data.
地址 McLean VA US