发明名称 APPLYING A MINIMUM SIZE BOUND ON CONTENT DEFINED SEGMENTATION OF DATA
摘要 Applying a content defined minimum size bound on blocks produced by content defined segmentation of data by calculating the size of the interval of data between a newly found candidate segmenting position and a last candidate segmenting position of same or higher hierarchy level, and then discarding the newly found candidate segmenting position if a size of an interval of data is lower than the minimum size bound, or retaining the newly found candidate segmenting position if the size of the interval of data is not lower than the minimum size bound or if there is no last candidate segmenting position of a same or higher hierarchy level as the newly found candidate segmenting position. When a last candidate segmenting position of a same or higher hierarchy level becomes available, the evaluation is reiterated to converge edge segmenting positions of the outputs of consecutive calculation units.
申请公布号 US2015019511(A1) 申请公布日期 2015.01.15
申请号 US201313942027 申请日期 2013.07.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 ARONOVICH Lior
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for applying a content defined minimum size bound on content defined segmentation of data into blocks using a processor device in a computing environment, comprising: defining a plurality of segmenting probabilities and a plurality of segmenting conditions, wherein each of the plurality of segmenting conditions is associated with one of the plurality of segmenting probabilities; ordering the plurality of segmenting conditions in accordance with the associated one of the plurality of segmenting probabilities to form a hierarchy of the plurality of segmenting conditions; defining a segmenting condition associated with a highest segmenting probability to be a lowest level segmenting condition in the hierarchy of the plurality of segmenting conditions, and defining the segmenting condition associated with a lowest segmenting probability to be a highest level segmenting condition in the hierarchy of the plurality of segmenting conditions; defining a minimum bound on a size of a block; calculating a plurality of hash values for each seed block in each consecutive byte position in the data; evaluating each one of the plurality of hash values using the plurality of segmenting conditions; determining a position of one of the plurality of hash values as a candidate segmenting position in the data if at least one of the plurality of segmenting conditions is satisfied by the hash value; defining a hierarchy level of a candidate segmenting position as the hierarchy level of the highest level segmenting condition that is satisfied by the one of the plurality of hash values of the candidate segmenting position; calculating the size of the interval of data between a newly found candidate segmenting position and a previous candidate segmenting position; and discarding the newly found candidate segmenting position if the size of the interval of data is lower than the minimum bound on the size of a block.
地址 Armonk NY US