发明名称 Collecting and aggregating datasets for analysis
摘要 Systems and methods of facilitating collecting and aggregating datasets that are machine or user-generated for analysis are disclosed. One embodiment includes, collecting a dataset on a machine on which the dataset is received or generated, wherein, the dataset is collected from a data source on the machine, aggregating the dataset collected from the data source at a receiving location, performing analytics on the dataset upon collection or aggregation, and/or writing the dataset aggregated at the receiving location to a storage location.
申请公布号 US9082127(B2) 申请公布日期 2015.07.14
申请号 US201012877878 申请日期 2010.09.08
申请人 Cloudera, Inc. 发明人 Hsieh Jonathan Ming-Cyn;Robinson Henry Noel
分类号 G06F17/30;G06Q30/02 主分类号 G06F17/30
代理机构 Perkins Coie LLP 代理人 Perkins Coie LLP
主权项 1. A method of facilitating collection and aggregation of machine or user generated dataset for analysis, the method comprising: collecting, by a compute node, the dataset from a data source on a machine, wherein the dataset is received or generated on the machine; transmitting the dataset from the data source toward a receiving location by steps including: recording the dataset as an event using a data model;extracting a timestamp from the dataset;specifying, based on the timestamp, a priority of the event in a priority field included in the data model;specifying, based on the priority, the event in the data model with an attribute in a metadata table included in the data model, wherein the attribute includes a map that directs how the event is to be streamed to a subsequent machine, wherein the metadata table included in the data model is extensible to add additional attributes to the event by the subsequent machine which is configured to further process the dataset as the event is streamed from the data source to the receiving location; aggregating the dataset collected from the data source at the receiving location, wherein the receiving location is dynamically updated by the compute node responsive to receiving the configuration information from a master node; and performing analytics on the dataset responsive to collecting or aggregating the dataset on the machine; wherein the dataset aggregated at the receiving location is written to a storage location, and wherein the dataset is stored redundantly on a distributed file system at the storage location.
地址 Palo Alto CA US