发明名称 Hierarchical content defined segmentation of data
摘要 A method, system, and computer program product for segmenting data into variable size blocks based on content defined positions. Segmenting probabilities and associated segmenting conditions are defined. The segmenting conditions are ordered in accordance with the associated segmenting probabilities to form a hierarchy of the segmenting conditions. A segmenting condition associated with a highest segmenting probability is defined to be a lowest level segmenting condition in the hierarchy of the segmenting conditions. The segmenting condition associated with a lowest segmenting probability is defined to be a highest level segmenting condition in the hierarchy of the segmenting conditions. Hash values are calculated for each seed block in each consecutive byte position in the data. Each one of the hash values is evaluated using the segmenting conditions. A segmenting position is determined in the data for each hash value that satisfies one of the segmenting conditions.
申请公布号 US9244830(B2) 申请公布日期 2016.01.26
申请号 US201313942048 申请日期 2013.07.15
申请人 GLOBALFOUNDRIES Inc. 发明人 Aronovich Lior
分类号 G06F12/00;G06F12/02;G06F3/06 主分类号 G06F12/00
代理机构 代理人
主权项 1. A method for segmenting data into variable size blocks based on content defined positions using a processor device in a computing environment, comprising: defining a plurality of segmenting probabilities and a plurality of segmenting conditions, wherein each of the plurality of segmenting conditions is associated with one of the plurality of segmenting probabilities; ordering the plurality of segmenting conditions in accordance with the associated one of the plurality of segmenting probabilities to form a hierarchy of the plurality of segmenting conditions; defining a segmenting condition associated with a highest segmenting probability to be a lowest level segmenting condition in the hierarchy of the plurality of segmenting conditions, and defining the segmenting condition associated with a lowest segmenting probability to be a highest level segmenting condition in the hierarchy of the plurality of segmenting conditions; calculating a plurality of hash values for each seed block in each consecutive byte position in the data; evaluating each one of the plurality of hash values using the plurality of segmenting conditions; and determining a segmenting position in the data for each hash value of the plurality of hash values that satisfies one of the plurality of segmenting conditions.
地址 Grand Cayman KY