发明名称 Classification of data in main memory database systems
摘要 Various technologies described herein pertain to classifying data in a main memory database system. A record access log can include a sequence of record access observations logged over a time period from a beginning time to an end time. Each of the record access observations can include a respective record ID and read timestamp. The record access log can be scanned in reverse from the end time towards the beginning time. Further, access frequency estimate data for records corresponding to record IDs read from the record access log can be calculated. The access frequency estimate data can include respective upper bounds and respective lower bounds of access frequency estimates for each of the records. Moreover, the records can be classified based on the respective upper bounds and the respective lower bounds of the access frequency estimates, such that K records can be classified as being frequently accessed records.
申请公布号 US9514174(B2) 申请公布日期 2016.12.06
申请号 US201213539347 申请日期 2012.06.30
申请人 Microsoft Technology Licensing, LLC 发明人 Levandoski Justin Jon;Larson Per-Ake
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人 Corie Alin;Swain Sandy;Minhas Micky
主权项 1. A method of classifying data in a main memory database system, comprising: initiating scanning of a record access log retained in a data repository in reverse from an end time towards a beginning time utilizing at least one processor, wherein the record access log comprises a sequence of record access observations logged over a time period from the beginning time to the end time, wherein each of the record access observations comprises a respective record identifier (ID) and a read timestamp; calculating access frequency estimate data for records corresponding to record IDs read from the record access log as the record access log is scanned in reverse, wherein the access frequency estimate data comprises respective upper bounds of access frequency estimates and respective lower bounds of the access frequency estimates for each of the records; storing, in the data repository, the access frequency estimate data for the records corresponding to the record IDs read from the record access log; classifying the records based on the respective upper bounds of the access frequency estimates and the respective lower bounds of the access frequency estimates as the record access log is scanned in reverse, wherein K records are classified as being frequently accessed records; continuing the scanning of the record access log in reverse, the calculating of the access frequency estimate data for the records as the record access log is scanned in reverse, and the classifying of the records as the record access log is scanned in reverse; discontinuing the scanning of the record access log based on classification of the records, the scanning being discontinued prior to reading a record access observation for the beginning time; and removing, from the data repository, the access frequency estimate data for a subset of the records as the record access log is scanned in reverse.
地址 Redmond WA US