发明名称 |
Data processing method and apparatus in cluster system |
摘要 |
In embodiments of the present invention, when a duplicate data query is performed on a received data stream, a first physical node which corresponds to each first sketch value and is in a cluster system is identified according to a first sketch value representing the data stream, and then the first sketch value representing the data stream is sent to the identified physical node for the duplicate data query, and a procedure of the duplicate data query does not change with an increase of the number of nodes in the cluster system; therefore, a calculation amount of each node does not increase with an increase of the number of nodes in the cluster system. |
申请公布号 |
US8892529(B2) |
申请公布日期 |
2014.11.18 |
申请号 |
US201314140403 |
申请日期 |
2013.12.24 |
申请人 |
Huawei Technologies Co., Ltd. |
发明人 |
Liu Qiang;Sun Quancheng;Liu Xiaobo;You Jun;Yang Huadi;Zhou Dan;Huang Yan |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Huawei Technologies Co., Ltd. |
代理人 |
Huawei Technologies Co., Ltd. |
主权项 |
1. A method of data de-duplication performed by a first processing node in storage system having a plurality of processing nodes each maintaining multiple data containers for storing de-duplicated data chunks, comprising:
receiving a data stream to be stored after de-duplication; dividing a segment of the data stream into a plurality of super-chunks, each super-chunk including multiple data chunks; deriving a first super-chunk identification (SID) for a super-chunk of the segment; identifying a second processing node of the storage system that corresponds to the first SID; querying the second processing node for a first data container that corresponds to the first SID, wherein the first data container is maintained by a third processing node of the storage system; obtaining fingerprints of data chunks stored in the first data container that corresponds to the first SID; based on a comparison between fingerprints of data chunks in the super-chunk and the obtained fingerprints to identify new data chunks whose signatures are not found in the obtained fingerprints; storing the new data chunks in a local buffer of the first processing node; selecting, according to a preset storage policy, a second data container of the storage system to write data in the local buffer; deriving a second SID for data of the local buffer; identifying, by the same way for identifying the second processing node, a fourth processing node of the storage system that corresponds to the second SID for data of the local buffer; and storing correspondence between the second SID for data of the local buffer and the second data container in the fourth processing node. |
地址 |
Shenzhen CN |