发明名称 Compression of tables based on occurrence of values
摘要 Methods and apparatus, including computer program products, for compression of tables based on occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column is generated, a vector representing whether the frequently occurring value exists in a row of the column is generated, and the number and the vector are stored to enable searches of the data represented by the number and the vector. The vector may omit a portion representing the group of adjacent rows. The values may be dictionary-based compression values representing business data such as business objects. The compression may be performed in-memory, in parallel, to improve memory utilization, network bandwidth consumption, and processing performance.
申请公布号 US8768899(B2) 申请公布日期 2014.07.01
申请号 US201213356567 申请日期 2012.01.23
申请人 SAP AG 发明人 Faerber Franz;Radestock Guenter;Ross Andrew
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Mintz Levin Cohn Ferris Glovsky and Popeo, P.C. 代理人 Mintz Levin Cohn Ferris Glovsky and Popeo, P.C.
主权项 1. A non-transitory computer program product, tangibly embodied in a computer-readable medium, the computer program product being operable to cause data processing apparatus to perform operations comprising: generating columns of dictionary-based compression values, the columns of dictionary-based compression values being based on a dictionary of possible values for each column of a column-based database and being structured business data; sorting the columns such that a first column ordered first in an ordering of the columns has a most-frequently occurring value of the first column occurring more frequently than frequently occurring values of other columns, such that the sorted first column includes, at one end of the first column, instances of the most-frequently occurring value of the first column and the other columns include, at an end of each of the other columns, instances of most-frequently occurring values of the other columns; generating a bit vector for at least one of the columns, each of the bit vectors representing most-frequently occurring values of a respective column, the generating comprising having each bit of the bit vector represent whether the most-frequently occurring value exists in a respective row of the respective column; generating a number for each of the columns having an associated bit vector, the number representing an amount of occurrences of the most-frequently occurring value of one end of the column, wherein the most frequently occurring value comprises a plurality of bits; removing from each of the bit vectors a representation of the most-frequently occurring value of one end of the respective column based on the number associated with the bit vector; storing the number associated with each of the bit vectors to enable non-volatile memory searches of the data represented by the number associated with each of the bit vectors and each of the bit vectors; and generating a delta index separate from the columns and configured to store changes to at least one column that occur after the generation and storing of the compression values, wherein the changes include at least one of an addition, a modification, or a deletion to the at least one column, and wherein the stored changes are comprised of dictionary values and value identifiers assigned in an attribute table, the dictionary values being added to the delta index in chronological order to reflect an ordering of changes made to data in the columns over time.
地址 Walldorf DE