发明名称 Method and system to cloud-enabled large-scaled internet data mining and data analytics
摘要 A method and procedure for large-scaled Internet data mining and data analytics for consumers over the cloud. The method describes an online market place which include an authoring tool generating computer scripts, executing the script to acquire data from URL, wherein the sequence of script instructions performs extraction and transformation of data, aggregating it into a dataset, publishing it for data consumer to pair with data analytics software programs in analyzing the dataset.
申请公布号 US9367590(B1) 申请公布日期 2016.06.14
申请号 US201314108136 申请日期 2013.12.17
申请人 Chun Connie 发明人 Chun Connie
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method to implement an online marketplace for large scale data mining and data analytics in a collaborative online social network environment, comprising: authoring automated scripts for computer to mine data from the Internet using Universal Resource Location (URL); retrieving, in response to a fetch data instruction, data with context that enable extraction and transformation to form a row of data record; automatically executing a script to acquire rows of data record; repeating execution of the script according to a given rule; aggregating a sequence of rows to form a dataset of a plurality of datasets that are provided by a plurality of data producers; publishing the plurality of datasets, for subsequent selection by a plurality of data consumers, in a dataset catalogue, wherein the dataset catalogue comprises a summary description of each of the plurality of datasets and associated with at least one of a plurality of data analytics software programs; presenting to a data; generating, by a computer processor and from the plurality of datasets, a selected dataset by matching dataset properties to a user-defined context defined by a data consumer of the plurality of data consumers; validating selection of the selected dataset based on a pre-determined criterion; defining data analytic algorithms of a plurality of data analytics software programs provided by a plurality of data analytics software providers; defining a required input parameter to each of the plurality of data analytics software programs; defining a data domain that is required as input to each of the data analytics algorithms; uploading the plurality of data analytics software programs to data analytics software program repository; publishing a purpose of each of the plurality of data analytics software programs in a data analytics software catalogue; generating, by the computer processor and from the plurality of data analytics software programs, a matching data analytics software that is compatible with the dataset properties of the selected dataset; executing the matching data analytics software program with the selected dataset to generate data analytics results for presenting to the data consumer; matching criteria of analytics provided by the data consumer; inspecting the dataset properties of the plurality of datasets in a dataset repository; inspecting data analytics properties in the data analytics software program repository; accepting input criteria based on the dataset properties of the plurality of datasets; further accepting the input criteria base on the data analytics properties of the plurality of data analytics software programs; generating, for including in the dataset catalogue, a pair of best matched dataset and data analytics software program by matching the dataset properties and the data analytics properties for similarities; producing an output of the pair of best matched dataset and data analytics software program; and recommending the pair of best matched dataset and data analytics software program to the data consumer, wherein the plurality of data producers, the plurality of data consumers, and the plurality of data analytics software providers form the collaborative online social network environment.
地址 Cupertino CA US