发明名称 SYSTEMS AND METHODS FOR OF IDENTIFYING ANOMALOUS DATA IN LARGE STRUCTURED DATA SETS AND QUERYING THE DATA SETS
摘要 The technology disclosed relates to automatic generation of tuples from a record set for outlier analysis. Applying this new technology, user need not specify which 1-tuples to combine into n-tuples. The tuples are generated from structured records organized into features (that also could be fields, objects or attributes.) Tuples are generated from combinations of feature values in the records. Thresholding is applied to manage the number of tuples generated. The technology disclosed further relates to indexing and searching high dimensional tuple spaces in a computer-implemented system.
申请公布号 US2014304279(A1) 申请公布日期 2014.10.09
申请号 US201414244146 申请日期 2014.04.03
申请人 Salesforce.com, inc. 发明人 Fuchs Matthew;Georgiev Stanislav
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A system that identifies anomalous data in large structured data sets, the system including: a computer including memory; computer instructions causing the computer to implement: automatically expanding an existing tuple set of elements with features from a record set by adding one more feature to the existing tuple set and creating unique elements with the one more feature, wherein the unique elements in an expanded tuple set enumerate permutations of unique values of the features from the record set that are combined in the expanded tuple set;limiting unique elements in the expanded tuple set to inhabited feature value combinations by applying a threshold count criterion of 2 or more to counts of how often the feature value combinations of the unique elements are found in the record set and not retaining unique elements in the expanded tuple set that do not satisfy the threshold count criterion;after expanding the existing tuple set into the expanded tuple set and applying the threshold count criterion, comparing frequencies of the unique elements in the expanded tuple set to frequencies of the unique elements in a reference data set; andspotting outliers based on the comparing of the frequencies.
地址 San Francisco CA US