发明名称 CACHE MANAGEMENT FOR MAP-REDUCE APPLICATIONS
摘要 A computer manages a cache for a MapReduce application based on a distributed file system that includes one or more storage medium by receiving a map request and receiving parameters for processing the map request. The parameters include a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously. The computer determines a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size and reads, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache. The computer processes the map request and writes an intermediate result data of the map request processing into the cache, based on the determined cache size.
申请公布号 US2016062900(A1) 申请公布日期 2016.03.03
申请号 US201514828600 申请日期 2015.08.18
申请人 International Business Machines Corporation 发明人 Liu Liang;Qu Junmei;Zhu ChaoQiang;Zhuang Wei
分类号 G06F12/08;G06F17/30;G06N99/00 主分类号 G06F12/08
代理机构 代理人
主权项 1. A method for managing a cache for a MapReduce application on a distributed file system, the method comprising: receiving, by a computer, a map request for a MapReduce application on a distributed file system that includes one or more storage medium; receiving, by the computer, parameters for processing the map request, the parameters including a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously; determining, by the computer, a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size; reading, by the computer, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache; processing, by the computer, the map request; and writing, by the computer, an intermediate result data of the map request processing into the cache, based on the determined cache size.
地址 Armonk NY US