发明名称 System, method and computer program for preparing data for analysis
摘要 A method of preparing data for analysis, comprising the steps of receiving an initial data set including a plurality of records, each of the plurality of records including an identifier attribute and an associative attribute that identifies a further one or more records; receiving the further one or more records identified by the associative attribute in each of the plurality of records; andassociating the further one or more records with the initial data set to form a final data set.
申请公布号 US9098573(B2) 申请公布日期 2015.08.04
申请号 US201113179439 申请日期 2011.07.08
申请人 Patent Analytics Holding Pty Ltd 发明人 Spielthenner Doris
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Alston & Bird LLP 代理人 Alston & Bird LLP
主权项 1. A computer-implemented method of automated preparation of a set of data for analysis, the method comprising the steps of: (a) retrieving from non-volatile memory or via a communication network an input data set including a plurality of records, each of the plurality of records including an identifier attribute value and at least one associative attribute value, the identifier attribute value being an identifier for the record and each associative attribute value being an identifier of another record; (b) identifying one or more further records to retrieve by comparing each associative attribute value from each of the plurality of data records in the input data set with the identifier attribute values of each of the plurality of records in the input data set, to determine any associative attribute values not matching any one of the identifier attributes values of records in the input data set, the associative attribute values not matching any one of the identifier attributes being identifier attributes of the one or more further records to retrieve; (c) retrieving from non-volatile memory or via a communication network the identified further records to retrieve; (d) associating the retrieved one or more further records with the initial data set to form a final data set; (e) forming one or more networks from records of the final data set by: (f) determining a plurality of direct connections, each one direct connection linking one record of the plurality of data records of the final data set that share at least one common identifier attribute value, to form one or more networks of data records to another one record of the final data set as a pair of data records, where the identifier attribute value of one record of the pair is the same as an associative attribute value of the other record of the pair; and (g) determining a set of records for each of the one or more networks based on the direct connections, where each record of the network forms a pair via a direct connection with at least one other record in the network; and (h) for at least one network, reducing the number of data records in the set of records for the network by performing the steps of: (i) identifying at least one pair of data records in the network having a direct connection and, for each selected pair of data records: (j) identify any one or more indirect connections linking the pair of data records via one or more intermediate records, each indirect connection consisting of x direct connections forming an x degree of separation link between the pair of data records, where x is a variable number between 2 and a maximum value n, and the value of x may be different for each indirect connection; (k) counting the number of indirect connections identified for each pair of records to derive a total count for the links between the data records of the pair; (l) setting a threshold level for the total count value between each pair of data records; (m) removing a direct link of any pair of data records having a total count value below the set threshold level; and (n) removing all data records that then no longer have a direct connection to any other record in the network; and (o) processing at least one of the one or more networks for further analysis or display.
地址 St. Kilda, Victoria AU