主权项 |
1. A method, in a data processing system, for optimization of a map/reduce distributed file system by log data analysis, the method comprising:
initiating, by a name node in the distributed file system, a log analysis map/reduce job on one or more connected data nodes via a map/reduce processing framework in the distributed file system, wherein initiating a log analysis map/reduce job on the one or more connected data nodes comprises initiating a plurality of algorithms to run on the one or more connected data nodes, wherein the plurality of algorithms accomplish the same goal installed on each of the one or more connected data nodes; receiving, by the name node, result data resulting from the log analysis map/reduce job from the one or more connected data nodes via the map/reduce processing framework in the distributed file system; performing, by the name node, analysis on the received result data, wherein performing analysis on the received result data comprises performing data mining on the received result data to identify patterns to predict which algorithms among the plurality of algorithms have a best performance on each given data node or across data nodes within the one or more connected data nodes; generating, by the name node, an optimization plan for the one or more connected data nodes based on results of the analysis and using the identified patterns, wherein generating the optimization plan comprises configuring each given data node within the one or more connected data nodes to use its corresponding predicted algorithm; and initiating, by the name node, the optimization plan on the one or more connected data nodes via the map/reduce processing framework in the distributed file system. |