发明名称 Parallelizing backup and restore for network-attached storage
摘要 The subject disclosure is directed towards the parallel backing up of a file system. A file system tree structure is walked by parallel workers that write file system data to data storage devices in parallel streams. Work assigned to one worker may be split to an idle worker to provide parallel operation. Relationship data that maintains relationships between the streams is persisted (e.g., via reference data inserted into the streams) so that a restore mechanism is able to stitch the streams together back into the file system data. Also persisted is location data that maintains storage locations of the streams. A file's data also may be written into parallel streams.
申请公布号 US9384200(B1) 申请公布日期 2016.07.05
申请号 US201213723604 申请日期 2012.12.21
申请人 EMC Corporation 发明人 Batchu Ravi V.;Kaura Suchit;Nellore Hymanand;Yuan Hsing;Miller Jeff;Joshi Sandeep
分类号 G06F17/30;G06F15/16 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for backing up at least part of a file system represented by a tree structure, the method comprising: distributing work to workers to walk parts of the tree structure in a parallel tree walk, the workers generating multiple streams; writing file system data for backing up the at least part of the file system represented by the tree structure to data storage devices in parallel streams of the multiple streams, distributing the work to the workers comprises: identifying a work item assigned to a first worker; and splitting the work item assigned to the first worker between the first worker and a second worker based on whether the second worker is idle and whether a data storage device is idle, wherein splitting the work item comprises identifying an intermediary hash value, assigning the first worker a first set of hash values to process based on the intermediary hash value, and assigning the second worker a second set hash to process based on the intermediary hash value, wherein the first set of hash values and the second set of hash values each comprises a respective minimum hash value and a respective maximum hash value; and persisting relationship data that maintains relationships between the streams and location data that maintains storage locations of the streams, further comprising: assigning a first stream identifier to the first stream; assigning a second stream identifier to the second stream; and inserting a reference to the second stream identifier into the first stream.
地址 Hopkinton MA US