发明名称 APPLYING A MAXIMUM SIZE BOUND ON CONTENT DEFINED SEGMENTATION OF DATA
摘要 Applying a content defined maximum size bound on blocks produced by content defined segmentation of data by calculating the size of the interval of data between a newly found candidate segmenting position and a last candidate segmenting position of the same or higher hierarchy level, and then using the intermediate candidate segmenting positions of that interval if the size of the interval exceeds the maximum size bound, or discarding the intermediate candidate segmenting positions of that interval if the size of the interval does not exceed the maximum size bound.
申请公布号 US2015019510(A1) 申请公布日期 2015.01.15
申请号 US201313942009 申请日期 2013.07.15
申请人 ARONOVICH Lior 发明人 ARONOVICH Lior
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for applying a content defined maximum size bound on content defined segmentation of data into blocks using a processor device in a computing environment, comprising: defining a plurality of segmenting probabilities and a plurality of segmenting conditions, wherein each of the plurality of segmenting conditions is associated with one of the plurality of segmenting probabilities; ordering the plurality of segmenting conditions in accordance with the associated one of the plurality of segmenting probabilities to form a hierarchy of the plurality of segmenting conditions; defining a segmenting condition associated with a highest segmenting probability to be a lowest level segmenting condition in the hierarchy of the plurality of segmenting conditions, and defining the segmenting condition associated with a lowest segmenting probability to be a highest level segmenting condition in the hierarchy of the plurality of segmenting conditions; defining a maximum bound on a size of a block; calculating a plurality of hash values for each seed block in each consecutive byte position in the data; evaluating each one of the plurality of hash values using the plurality of segmenting conditions; determining a position of one of the plurality of hash values as a candidate segmenting position in the data if at least one of the plurality of segmenting conditions is satisfied by the hash value; defining a hierarchy level of a candidate segmenting position as the hierarchy level of the highest level segmenting condition that is satisfied by the one of the plurality of hash values of the candidate segmenting position; recording candidate segmenting positions with hierarchy levels of the candidate segmenting positions; calculating the size of the interval of data between a newly found candidate segmenting position and a previous candidate segmenting position; and determining the candidate segmenting positions of the interval of data to be actual segmenting positions if the size of the interval of data exceeds the maximum bound on the size of the block.
地址 Thornhill CA