发明名称 COMPUTER SYSTEM PROGRAMMED TO IDENTIFY COMMON SUBSEQUENCES IN LOGS
摘要 A data processing method includes receiving a stream of digital data with a plurality of objects and, in response to receiving an object, tokenizing the object to create a tokenized object, and storing the tokenized object in a token database. The method further includes comparing the tokenized object to a plurality of other tokenized objects stored in the token database, computing a pattern associated with the tokenized object, storing the pattern in a pattern database, and managing a size of the pattern database by identifying, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern, ranking each pattern of the subset based on a quality and a popularity metric, identifying, based on the ranking and from the subset, a second pattern and deleting the second pattern from the pattern database to produce an updated database.
申请公布号 US2017091190(A1) 申请公布日期 2017.03.30
申请号 US201514869859 申请日期 2015.09.29
申请人 Cisco Technology, Inc. 发明人 ATTIAS ROBERTO;Prieto Alberto Gonzalez
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: using a computer, receiving a stream of digital data comprising a plurality of objects; using programmed tokenizer instructions executed using the computer, in response to receiving a first object of the plurality of objects, tokenizing the first object to create a first tokenized object and electronically digitally storing the first tokenized object in a token database that comprises a plurality of other tokenized objects and using an electronic digital storage device; using the computer, comparing the first tokenized object to the plurality of other tokenized objects stored in the token database, computing a first pattern associated with the first tokenized object, and storing the first pattern in a pattern database that comprises a plurality of patterns; using the computer, managing a size of the pattern database by: identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns; ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values; identifying, based on the ranking and from the subset, a second pattern and deleting the second pattern from the pattern database to produce an updated database; repeating the tokenizing, comparing and storing using the updated database; wherein the method is executed using one or more computing devices.
地址 San Jose CA US