发明名称 Data processing method and apparatus in cluster system
摘要 In embodiments of the present invention, when a duplicate data query is performed on a received data stream, a first physical node which corresponds to each first sketch value and is in a cluster system is identified according to a first sketch value representing the data stream, and then the first sketch value representing the data stream is sent to the identified physical node for the duplicate data query, and a procedure of the duplicate data query does not change with an increase of the number of nodes in the cluster system; therefore, a calculation amount of each node does not increase with an increase of the number of nodes in the cluster system.
申请公布号 US8892529(B2) 申请公布日期 2014.11.18
申请号 US201314140403 申请日期 2013.12.24
申请人 Huawei Technologies Co., Ltd. 发明人 Liu Qiang;Sun Quancheng;Liu Xiaobo;You Jun;Yang Huadi;Zhou Dan;Huang Yan
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Huawei Technologies Co., Ltd. 代理人 Huawei Technologies Co., Ltd.
主权项 1. A method of data de-duplication performed by a first processing node in storage system having a plurality of processing nodes each maintaining multiple data containers for storing de-duplicated data chunks, comprising: receiving a data stream to be stored after de-duplication; dividing a segment of the data stream into a plurality of super-chunks, each super-chunk including multiple data chunks; deriving a first super-chunk identification (SID) for a super-chunk of the segment; identifying a second processing node of the storage system that corresponds to the first SID; querying the second processing node for a first data container that corresponds to the first SID, wherein the first data container is maintained by a third processing node of the storage system; obtaining fingerprints of data chunks stored in the first data container that corresponds to the first SID; based on a comparison between fingerprints of data chunks in the super-chunk and the obtained fingerprints to identify new data chunks whose signatures are not found in the obtained fingerprints; storing the new data chunks in a local buffer of the first processing node; selecting, according to a preset storage policy, a second data container of the storage system to write data in the local buffer; deriving a second SID for data of the local buffer; identifying, by the same way for identifying the second processing node, a fourth processing node of the storage system that corresponds to the second SID for data of the local buffer; and storing correspondence between the second SID for data of the local buffer and the second data container in the fourth processing node.
地址 Shenzhen CN