发明名称 Sorting a data set by using a limited amount of memory in a processing system
摘要 An efficient and highly scalable method of sorting an input file in a processing system by using only a limited amount (i.e., a portion) of memory in the processing system, where that amount of memory is substantially smaller than the input file, is disclosed. The input file can be, for example, a fingerprint database for use in deduplication, and the processing system can be, for example, a network storage server. The merge phase is broken down into sub-phases, where each sub-phase takes a predetermined number of subsets of a fingerprint file to merge and writes them back as a sorted, merged group. The number of threads used to process these groups can depend on the number of central processing units (CPUs) present in the system and can be dynamically tuned to achieve desired level of performance.
申请公布号 US9268832(B1) 申请公布日期 2016.02.23
申请号 US201012782619 申请日期 2010.05.18
申请人 NetApp, Inc. 发明人 Challapalli Venkata Vijay Chaitanya
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Gilliam IP PLLC 代理人 Gilliam IP PLLC
主权项 1. A method of sorting an input file, the method comprising: creating and sorting a plurality of temporary files, each temporary file including the contents of a different subset of a plurality of subsets of the input file; defining a plurality of groups from the plurality of temporary files, each group of the plurality of groups including all of the contents of two or more of the plurality of temporary files; selecting two or more of the groups; sorting concurrently, in memory of a processing system, the contents of each selected group, by using a separate execution thread of a plurality of execution threads to sort each selected group, where each execution thread sorts contents of an associated selected group stored within a separate portion of the memory; and merging sorted contents of the selected two or more groups into a single file.
地址 Sunnyvale CA US