发明名称 Compiler-guided software accelerator for iterative HADOOP® jobs
摘要 Various methods are provided directed to a compiler-guided software accelerator for iterative HADOOP® jobs. A method includes identifying intermediate data, generated by an iterative HADOOP® application, below a predetermined threshold size and used less than a predetermined threshold time period. The intermediate data is stored in a memory device. The method further includes minimizing input, output, and synchronization overhead for the intermediate data by selectively using at any given time any one of a Message Passing Interface and Distributed File System as a communication layer. The Message Passing Interface is co-located with the HADOOP® Distributed File System.
申请公布号 US9201638(B2) 申请公布日期 2015.12.01
申请号 US201313923458 申请日期 2013.06.21
申请人 NEC Laboratories America, Inc. 发明人 Ravi Nishkam;Verma Abhishek;Chakradhar Srimat T.
分类号 G06F9/45;G06F9/52;G06F9/54 主分类号 G06F9/45
代理机构 代理人 Kolodka Joseph
主权项 1. A method, comprising: identifying a set of map tasks and reduce tasks capable of being reused across multiple iterations of an iterative HADOOP® application; and reducing a system load imparted on a computer system executing the iterative HADOOP® application by transforming a source code of the iterative HADOOP® application to launch the map tasks in the set only once and keep the map tasks in the set alive for an entirety of the execution of the iterative HADOOP® application; wherein the map tasks in the set are kept alive for the entirety of the execution by guarding an invocation to a runjob( ) function beginning at a first iteration of the iterative HADOOP® application to prevent a re-launching of any of the maps tasks and reduce tasks in the set in subsequent iterations of the iterative HADOOP® application, the invocation to the runJob( ) function is guarded by a flag, which is set to true for the first iteration and false for the subsequent iterations.
地址 Princeton NJ US