发明名称 |
Server side sampling of databases |
摘要 |
A system and method for use with a data mining application for a large database having a large number of records. A selection attribute is chosen from one of a plurality of attributes contained by records within the database. Records are scanned in the database and a randomizing function is applied to the selection attribute of each record to create a randomized record value. A selection criteria is then applied to identify records for inclusion within a subset of records (smaller than the original data set) by comparing the randomized record value of each record with the selection criteria. The subset of records having a randomized record value satisfying the selection criteria approximates the entire database but takes up less memory and can be evaluated or scanned much more quickly.
|
申请公布号 |
US2003005087(A1) |
申请公布日期 |
2003.01.02 |
申请号 |
US20010864591 |
申请日期 |
2001.05.24 |
申请人 |
MICROSOFT CORPORATION |
发明人 |
BERNHARDT JEFFREY R.;VINARSKY ILYA |
分类号 |
G06F7/00;G06F17/30;(IPC1-7):G06F7/00 |
主分类号 |
G06F7/00 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|