发明名称 File system optimization by log/metadata analysis
摘要 A mechanism is provided in a data processing system for optimization of a distributed file system by log data analysis. A name node in the distributed file system initiates a log analysis map/reduce job on one or more connected data nodes via a map/reduce processing framework in the distributed file system and receives result data resulting from the log analysis map/reduce job from the one or more connected data nodes via the map/reduce processing framework in the distributed file system. The name node performs analysis on the received result data and generates an optimization plan for the one or more connected data nodes based on results of the analysis. The name node initiates the optimization plan on the one or more connected data nodes via the map/reduce processing framework in the distributed file system.
申请公布号 US8990294(B2) 申请公布日期 2015.03.24
申请号 US201213449459 申请日期 2012.04.18
申请人 International Business Machines Corporation 发明人 Hylick Anthony N.;Rawson, III Freeman L.;Van Hensbergen Eric
分类号 G06F15/16;G06F9/46;G06F17/30;G06F9/50 主分类号 G06F15/16
代理机构 代理人 Tkacs Stephen R.;Walder, Jr. Stephen J.;Stock William J.
主权项 1. A method, in a data processing system, for optimization of a map/reduce distributed file system by log data analysis, the method comprising: initiating, by a name node in the distributed file system, a log analysis map/reduce job on one or more connected data nodes via a map/reduce processing framework in the distributed file system, wherein initiating a log analysis map/reduce job on the one or more connected data nodes comprises initiating a plurality of algorithms to run on the one or more connected data nodes, wherein the plurality of algorithms accomplish the same goal installed on each of the one or more connected data nodes; receiving, by the name node, result data resulting from the log analysis map/reduce job from the one or more connected data nodes via the map/reduce processing framework in the distributed file system; performing, by the name node, analysis on the received result data, wherein performing analysis on the received result data comprises performing data mining on the received result data to identify patterns to predict which algorithms among the plurality of algorithms have a best performance on each given data node or across data nodes within the one or more connected data nodes; generating, by the name node, an optimization plan for the one or more connected data nodes based on results of the analysis and using the identified patterns, wherein generating the optimization plan comprises configuring each given data node within the one or more connected data nodes to use its corresponding predicted algorithm; and initiating, by the name node, the optimization plan on the one or more connected data nodes via the map/reduce processing framework in the distributed file system.
地址 Armonk NY US