发明名称 Collaborative Analytics Map Reduction Classification Learning Systems and Methods
摘要 Disclosed herein are systems and methods for data learning and classification for rapidly processing extremely large volumes of input data using one or more computing devices, that are application and platform independent, participating in a distributed parallel processing environment. In one embodiments, a system may comprise a plurality of parallel Map Reduction Aggregation Processors operating on the one or more computing devices, and configured to receive different sets of input data for data aggregation. Each of the Map Reduction Aggregation Processors may comprise one or more parallel Mapping Operation Modules configured to consistently dissect the input data into individual intermediate units of mapping outputs comprising consistently mapped data keys, and any values related to mapped data keys, conducive to simultaneous parallel reduction processing; and one or more parallel Reduction Operation Modules configured to continually and simultaneously consume the mapping outputs by eliminating the matching keys and aggregating values consistent with a specified reduction operation for all matching keys that are encountered during consumption of the mapping outputs. The system may also include an application-specific Classification Metric Function Operations Module operating on the one or more computing devices and configured to receive reduction outputs from the Reduction Operations Modules to determine distance and/or similarity between each of the different sets of input data with respect to one or more data classification categories using one or more distance and/or similarity calculations.
申请公布号 US2014222736(A1) 申请公布日期 2014.08.07
申请号 US201414169689 申请日期 2014.01.31
申请人 Drew Jacob 发明人 Drew Jacob
分类号 G06N99/00 主分类号 G06N99/00
代理机构 代理人
主权项 1. A data learning and classification system for rapidly processing extremely large volumes of input data using one or more computing devices, that are application and platform independent, participating in a distributed parallel processing environment, the system comprising: a plurality of parallel Map Reduction Aggregation Processors operating on the one or more computing devices, and configured to receive different sets of input data for data aggregation, each of the Map Reduction Aggregation Processors comprising: one or more parallel Mapping Operation Modules configured to consistently dissect the input data into individual intermediate units of mapping outputs comprising consistently mapped data keys, and any values related to mapped data keys, conducive to simultaneous parallel reduction processing, andone or more parallel Reduction Operation Modules configured to continually and simultaneously consume the mapping outputs by eliminating the matching keys and aggregating values consistent with a specified reduction operation for all matching keys that are encountered during consumption of the mapping outputs; and an application-specific Classification Metric Function Operations Module operating on the one or more computing devices and configured to receive reduction outputs from the Reduction Operations Modules to determine distance and/or similarity between each of the different sets of input data with respect to one or more data classification categories using one or more distance and/or similarity calculations.
地址 Canton TX US