发明名称 Optimizing distributed data analytics for shared storage
摘要 Methods, systems, and computer executable instructions for performing distributed data analytics are provided. In one exemplary embodiment, a method of performing a distributed data analytics job includes collecting application-specific information in a processing node assigned to perform a task to identify data necessary to perform the task. The method also includes requesting a chunk of the necessary data from a storage server based on location information indicating one or more locations of the data chunk and prioritizing the request relative to other data requests associated with the job. The method also includes receiving the data chunk from the storage server in response to the request and storing the data chunk in a memory cache of the processing node which uses a same file system as the storage server.
申请公布号 US9456049(B2) 申请公布日期 2016.09.27
申请号 US201514814445 申请日期 2015.07.30
申请人 NetApp, Inc. 发明人 Soundararajan Gokul;Mihailescu Madalin
分类号 G06F9/46;H04L29/08;G06F17/30;G06F9/50;G06F12/08 主分类号 G06F9/46
代理机构 LeClairRyan, a Professional Corporation 代理人 LeClairRyan, a Professional Corporation
主权项 1. A method of performing a distributed data analytics job, the method comprising: collecting, by a processing node of a distributed data analytics computer system, application-specific information to identify necessary data to perform a first task of the data analytics job; requesting, by the processing node, a data chunk of the necessary data from a storage server based on location information indicating one or more locations of the data chunk and prioritizing the request relative to other data requests associated with the job; and receiving, by the processing node, the data chunk from the storage server at the processing node in response to the request and storing the data chunk in a memory cache of the processing node, the memory cache of the processing node using a same type of file system as the storage server.
地址 Sunnyvale CA US