发明名称 |
COMPUTER SYSTEM PROGRAMMED TO IDENTIFY COMMON SUBSEQUENCES IN LOGS |
摘要 |
A data processing method includes receiving a stream of digital data with a plurality of objects and, in response to receiving an object, tokenizing the object to create a tokenized object, and storing the tokenized object in a token database. The method further includes comparing the tokenized object to a plurality of other tokenized objects stored in the token database, computing a pattern associated with the tokenized object, storing the pattern in a pattern database, and managing a size of the pattern database by identifying, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern, ranking each pattern of the subset based on a quality and a popularity metric, identifying, based on the ranking and from the subset, a second pattern and deleting the second pattern from the pattern database to produce an updated database. |
申请公布号 |
US2017091190(A1) |
申请公布日期 |
2017.03.30 |
申请号 |
US201514869859 |
申请日期 |
2015.09.29 |
申请人 |
Cisco Technology, Inc. |
发明人 |
ATTIAS ROBERTO;Prieto Alberto Gonzalez |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method comprising:
using a computer, receiving a stream of digital data comprising a plurality of objects; using programmed tokenizer instructions executed using the computer, in response to receiving a first object of the plurality of objects, tokenizing the first object to create a first tokenized object and electronically digitally storing the first tokenized object in a token database that comprises a plurality of other tokenized objects and using an electronic digital storage device; using the computer, comparing the first tokenized object to the plurality of other tokenized objects stored in the token database, computing a first pattern associated with the first tokenized object, and storing the first pattern in a pattern database that comprises a plurality of patterns; using the computer, managing a size of the pattern database by: identifying, from the plurality of patterns, a subset of patterns that are eligible for deletion from the pattern database based on an age of each pattern and storing in computer memory data identifying the subset of patterns; ranking each pattern of the subset based on a quality metric and a popularity metric, by marking the data identifying the subset of patterns with rank values; identifying, based on the ranking and from the subset, a second pattern and deleting the second pattern from the pattern database to produce an updated database; repeating the tokenizing, comparing and storing using the updated database; wherein the method is executed using one or more computing devices. |
地址 |
San Jose CA US |