主权项 |
1. A method of providing data deduplication across first and second storage devices in an encrypted storage system, comprising:
storing respective first and second data units along with respective first and second keyed data digests of the first and second data units at the first and second storage devices, the first and second data units encrypted under respective distinct data encryption keys, the first and second keyed data digests calculated from the respective first and second data units and a data digest key; engaging in a secure equivalence detection process between the first and second storage devices to determine whether the first data unit stored at the first storage device is a duplicate of the second data unit stored at the second storage device, the process employing two distinct asymmetric key pairs having respective first and second public keys, both key pairs being members of one mathematical prime group having a modulus and a generator, the process including:
an exchange phase including (1) at each of the first and second storage devices, calculating respective first and second products from the respective first and second keyed data digests and the respective first and second public keys and providing the respective first and second products to the second and first storage devices respectively, (2) at the first storage device, calculating a first quotient and a first hash and providing the first hash to the second storage device, the first quotient calculated from the first keyed data digest and first public key and the second product, the first hash calculated as a message digest of the first quotient combined with the first and second products, and (3) at the second storage device, calculating a second quotient and second hash and providing the second hash to the first storage device, the second quotient calculated from the second keyed data digest and second public key and the first product, the second hash calculated as a message digest of the second quotient combined with the first hash; and
a testing phase including one or both of (1) at the second storage device, calculating a first candidate hash and comparing it against the first hash from the first storage device, the first candidate hash calculated as a message digest of the second quotient combined with the first and second products, the comparing generating a second-unit indication whether the second data unit is a duplicate of the first data unit, and (2) at the first storage device, calculating a second candidate hash and comparing it against the second hash from the second storage device, the second candidate hash calculated as a message digest of the first quotient combined with the second hash, the comparing generating a first-unit indication whether the first data unit is a duplicate of the second data unit; andbased upon the first-unit indication and/or the second-unit indication, deleting the data unit at the respective first and second storage devices and creating a respective mapping between an identifier of the respective first or second data unit at the respective first or second storage device and the respective second or first data unit stored in the respective second or first storage device,wherein the first and second storage devices have different data access characteristics to collectively provide storage over different phases of a data lifecycle, and wherein deduplication is performed as part of migrating the respective first or second data unit from the respective first or second storage devices to the respective second or first storage device. |