发明名称 Method and system for estimating the size of a joined table
摘要 A method, system, and/or computer program product estimate a cardinality of a joined table (T) obtained by joining at least a first data column (R) and a second data column (S), where R and S each comprise attribute values. A first density distribution function f(x) describes a frequency of the attribute values of R. A second density distribution function (g(x)) describes the frequency of the attribute values of S. A first information on values in R is based on a sample of values of R. A second information on values in S is based on a sample of values of S. One or more processors then estimate a cardinality of a joined table (T) based on the first and second density distribution function (f(x), g(x)) and the first and second information on values.
申请公布号 US9460153(B2) 申请公布日期 2016.10.04
申请号 US201314053056 申请日期 2013.10.14
申请人 International Business Machines Corporation 发明人 Gruszecki Artur M.;Kazalski Tomasz;Milka Grzegorz S.;Skibski Konrad K.;Stradomski Tomasz
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Law Office of Jim Boice 代理人 Law Office of Jim Boice
主权项 1. A method for optimizing resource usage when accessing a database based on an estimation of a cardinality of a joined table (T) obtained by joining at least a first data column (R) and a second data column (S), wherein R and S each comprise attribute values, and wherein the method comprises: receiving, by one or more processors, a first density distribution function f(x) describing a frequency of attribute values of the first data column (R); receiving, by one or more processors, a second density distribution function (g(x)) describing a frequency of attribute values of the second data column (S); receiving, by one or more processors, a first information on values in the first data column (R) based on a sample of values of the first data column (R); receiving, by one or more processors, a second information on values in the second data column (S) based on a sample of values of the second data column (S); estimating, by one or more processors, a cardinality of a joined table (T) based on the first and second density distribution functions (f(x), g(x)) and the first and second information on values; receiving, by one or more processors, a request from a client computer for data from a database in a database server; comparing, by one or more processors, an estimated cardinality of the joined table (T) to estimated cardinalities of other joined tables created from the database; in response to determining that the estimated cardinality of the joined table (T) is less than any of the estimated cardinalities of the other joined tables created from the database, minimizing execution time and computer resource usage when responding to the request for data by utilizing, by one or more processors, the joined table (T) before utilizing any of the other joined tables created from the database; and generating and transmitting, by one or more processors, the joined table (T) to a requester of the request for data.
地址 Armonk NY US