发明名称 Encoding data stored in a column-oriented manner
摘要 Data stored in a column-oriented manner is encoded using a data mining algorithm for finding column patterns among a set of data tuples, where each data tuple contains a set of columns, and the data mining algorithm treats all columns and all column combinations and column ordering similarly or in the same manner when looking for column patterns. Column values are ordered occurring in the column patterns based on their frequencies into a prefix tree, where the prefix tree defines a pattern order. The data tuples are sorted according to the pattern order, resulting in sorted data tuples, and columns of the sorted data tuples are encoded using run-length encoding.
申请公布号 US9325344(B2) 申请公布日期 2016.04.26
申请号 US201113206827 申请日期 2011.08.10
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Beier Felix;Draese Oliver;Stolze Knut
分类号 G06F17/30;H03M7/40;H03M7/46 主分类号 G06F17/30
代理机构 Edell, Shapiro & Finnan, LLC 代理人 Kashef Mohammed;Edell, Shapiro & Finnan, LLC
主权项 1. A method for encoding data stored in a column-oriented manner, comprising: finding column patterns among a set of data tuples, wherein each data tuple contains a set of columns, and wherein the column patterns are identified based upon a single column individually, and upon a combination of columns together, by grouping each column together with one or more other columns, without regard for column order, and evaluating rows of the grouped columns to determine the column patterns; identifying single column values occurring in the column patterns, in order to reduce a number of nodes of a prefix tree comprising a plurality of nodes; ordering the single column values occurring in the column patterns based on their corresponding frequencies into the prefix tree, wherein the prefix tree defines a pattern order and is constructed to share the nodes corresponding to the single column values occurring more frequently; sorting the data tuples according to the pattern order, resulting in sorted data tuples; and encoding columns of the sorted data tuples using run-length encoding.
地址 Armonk NY US