发明名称 |
Hierarchical index based compression |
摘要 |
Computer-readable media, systems, and methods for hierarchical index based compression are described. In embodiments, a hierarchical data log or key-value pair based data log, such as a JSON log, is received and a tree-structured index (index tree) is recursively constructed. In one embodiment, the log comprises search-engine user interaction information. Structural information of the log is preserved by the index tree structure; for example, each node of the log has a corresponding index-tree node. Frequently repeating keys, values, and correlated key-value pairs may be stored in the index-tree node, which may be indexed using multiple levels of detail including a raw-string level for raw string representations of the node, a first level for indexing keys and common values, and a second level for indexing correlated key-value pairs. The index tree may be used to compress rows of the data log and also used to decompress and restore the log. |
申请公布号 |
US9355111(B2) |
申请公布日期 |
2016.05.31 |
申请号 |
US201414265806 |
申请日期 |
2014.04.30 |
申请人 |
Microsoft Technology Licensing, LLC |
发明人 |
He Liu;Cao Bo;Yan An |
分类号 |
G06F7/02;G06F17/30 |
主分类号 |
G06F7/02 |
代理机构 |
|
代理人 |
Ream Dave;Wong Tom;Minhas Micky |
主权项 |
1. One or more computer-readable storage media having instructions embodied thereon that, when executed, perform a method for building an index tree for compressing a logfile, the method comprising:
receiving a logfile comprising one or more nodes, each node having one or more keys, values, and key-value pairs; for a first node, determining whether to index a raw string representation of the node; upon determining to index the raw string, storing the raw string representation of the node in a raw-string level index; upon determining not to index the raw string: (1) for each key in the node, storing the key in a first level index; (2) for each key-value pair in the node, determining that the key-value pair is correlated; and (3) upon determining that the key-value pair is correlated, storing the key-value pair in a second level index; and creating an index tree corresponding to the logfile from the raw string, first, and second level indexes of the first node. |
地址 |
Redmond WA US |