发明名称 Efficient data infrastructure for high dimensional data analysis
摘要 Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.
申请公布号 US7870114(B2) 申请公布日期 2011.01.11
申请号 US20070818879 申请日期 2007.06.15
申请人 MICROSOFT CORPORATION 发明人 ZHANG HAIDONG;LIU GUOWEI;LI YANTAO;SUN BING;WANG JIAN
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址