发明名称 |
Efficient string search |
摘要 |
Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content. |
申请公布号 |
US9542387(B2) |
申请公布日期 |
2017.01.10 |
申请号 |
US201414326230 |
申请日期 |
2014.07.08 |
申请人 |
DELL SOFTWARE INC. |
发明人 |
Raffill Thomas E.;Zhu Shunhui;Yanovsky Roman;Yanovsky Boris;Gmuender John |
分类号 |
G10L21/00;G06F17/28;G06F17/27;G06F17/30 |
主分类号 |
G10L21/00 |
代理机构 |
Polsinelli LLP |
代理人 |
Polsinelli LLP |
主权项 |
1. A method for classifying content written in a non-delimited language, the method comprising:
receiving a string of bytes at a finite state machine (FSM), wherein the string of bytes is received by electronic hardware associated with the FSM after a user attempts to access information related to the string of bytes; performing a string search on the string of bytes, wherein the string search identifies that the string of bytes includes a set of N-grams that match one or more states in a set of states, wherein the FSM connects the one or more states in the set of states; collecting statistical information regarding the set of N-grams received in the string of bytes, wherein the collected statistical information corresponds to a condition in a model stored in a model repository; receiving the model from the model repository; identifying that the one or more states and that one or more N-grams in the set of N-grams correspond to the received model; identifying that a length of the received string of bytes also corresponds to the condition in the received model when the length of the received string of bytes is of a certain length; classifying content of the one or more N-grams as being prohibited according to the one or more states, the one or more N-grams, and the length that corresponds to the condition; and denying access to the content when the classification of the content is prohibited, wherein denying access to the content prevents the content from being displayed on a display accessible to the user after the user attempt to access the information relating to the string of bytes. |
地址 |
Round Rock TX US |