发明名称 Multivariate Insight Discovery Approach
摘要 A raw dataset including measures and dimensions is processed, by a preprocessing module, using an algorithm that produces a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the raw dataset. The preprocessed dataset is then analyzed by a statistical analysis module to identify subsets of the preprocessed dataset that include a non-random structure or pattern. The analysis of the preprocessed dataset includes the at least one type of statistical analysis that produces the same results for both the preprocessed and raw datasets. The identified subsets are then ranked by a statistical ranker based on the analysis of the preprocessed dataset and a subset is selected for visualization based on the rankings A visualization module then generates a visualization of the selected identified subset that highlights a non-random structure of the selected subset.
申请公布号 US2016103902(A1) 申请公布日期 2016.04.14
申请号 US201414511047 申请日期 2014.10.09
申请人 Moser Flavia;MacAulay Alexander Kennedy;Gosper Julian 发明人 Moser Flavia;MacAulay Alexander Kennedy;Gosper Julian
分类号 G06F17/30;G06T11/20 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: accessing a dataset including measures and dimensions by a preprocessing module including at least one processor; processing the dataset, by the preprocessing module, to generate a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the dataset; analyzing the preprocessed dataset, by a statistical analysis module including at least one processor, to identify subsets of the preprocessed dataset that include a non-random structure, the analyzing including the at least one type of statistical analysis; generating a score for each of the identified subsets, by the statistical analysis module, based on the non-random structures included in each of the identified subsets; ranking each of the identified subsets, by a statistical ranking module including at least one processor, based on the score generated for each of the identified subsets and selecting an identified subset based on the ranking of the identified subset; and generating, by a visualization module including at least one processor, a visualization that highlights a non-random structure of the selected identified subset.
地址 Vancouver CA