发明名称 Managing distributed system performance using accelerated data retrieval operations
摘要 A distributed system is adapted to manage the performance of distributed processes. In one aspect, multiple stripes associated with a data item are stored in a distributed storage. The stored stripes include one or more stripes of redundancy information for the data item. A distributed process including at least one task is performed. During performance of the distributed process, a determination is made as to whether to perform an accelerated data retrieval operation. Responsive to a determination to perform an accelerated data retrieval operation, at least one of the one or more stripes of redundancy information for the data item is requested from the distributed storage. Other stripes associated with the data item may also be requested from the distributed storage. After a sufficient subset of stripes associated with the data item is received, the data item is reconstructed using the subset.
申请公布号 US9444889(B1) 申请公布日期 2016.09.13
申请号 US201313763459 申请日期 2013.02.08
申请人 Quantcast Corporation 发明人 Rus Silvius V.;Molina-Estolano Esteban
分类号 G06F15/173;H04L29/08 主分类号 G06F15/173
代理机构 Fenwick & West LLP 代理人 Fenwick & West LLP ;Reasoner Robin W.;Jacowitz Renee D.
主权项 1. A computer-implemented method for managing performance of a distributed system, the method comprising: storing, in a plurality of storage devices of a distributed storage, a plurality of stripes associated with a data item, the plurality of stripes generated according to a coding scheme, wherein the coding scheme generates a number of stripes associated with the data item that is more than a minimum number of stripes needed to assemble the data item, and wherein the plurality of stripes includes at least one redundancy stripe; performing a distributed process including a task that requires retrieval of the data item from the distributed storage; determining a processing speed associated with the task; and responsive to determining the processing speed does not meet a threshold, performing an accelerated data retrieval operation by requesting more than the minimum number of stripes needed to assemble the data item from at least two of the plurality of storage devices of the distributed storage;receiving at least the minimum number of stripes; andreconstructing the data item from the minimum number of stripes according to the coding scheme.
地址 San Francisco CA US