发明名称 Hierarchical index based compression
摘要 Computer-readable media, systems, and methods for hierarchical index based compression are described. In embodiments, a hierarchical data log or key-value pair based data log, such as a JSON log, is received and a tree-structured index (index tree) is recursively constructed. In one embodiment, the log comprises search-engine user interaction information. Structural information of the log is preserved by the index tree structure; for example, each node of the log has a corresponding index-tree node. Frequently repeating keys, values, and correlated key-value pairs may be stored in the index-tree node, which may be indexed using multiple levels of detail including a raw-string level for raw string representations of the node, a first level for indexing keys and common values, and a second level for indexing correlated key-value pairs. The index tree may be used to compress rows of the data log and also used to decompress and restore the log.
申请公布号 US9355111(B2) 申请公布日期 2016.05.31
申请号 US201414265806 申请日期 2014.04.30
申请人 Microsoft Technology Licensing, LLC 发明人 He Liu;Cao Bo;Yan An
分类号 G06F7/02;G06F17/30 主分类号 G06F7/02
代理机构 代理人 Ream Dave;Wong Tom;Minhas Micky
主权项 1. One or more computer-readable storage media having instructions embodied thereon that, when executed, perform a method for building an index tree for compressing a logfile, the method comprising: receiving a logfile comprising one or more nodes, each node having one or more keys, values, and key-value pairs; for a first node, determining whether to index a raw string representation of the node; upon determining to index the raw string, storing the raw string representation of the node in a raw-string level index; upon determining not to index the raw string: (1) for each key in the node, storing the key in a first level index; (2) for each key-value pair in the node, determining that the key-value pair is correlated; and (3) upon determining that the key-value pair is correlated, storing the key-value pair in a second level index; and creating an index tree corresponding to the logfile from the raw string, first, and second level indexes of the first node.
地址 Redmond WA US