发明名称 |
METHOD OF AUTOMATED ANALYSIS OF TEXT DOCUMENTS |
摘要 |
Automated analysis of text documents is used to scan text documents in order to find phrases or text fragments from other documents, or modifying the existing ones. A comparatively fast and universally applicable method finds phrases, sentences or even text fragments from other documents. The method includes: all electronic files containing model documents are converted to a given format; meaningful fragments, called “clauses”, are extracted from them; the converted files containing model documents are stored in the database; each electronic file containing a document to be analyzed is converted to the given format; clauses extracted from analyzed documents are compared with clauses extracted from model documents; fractions of clauses from an analyzed document matching clauses from each model document are calculated; fractions found are then compared with a pre-set threshold value in order to find out whether there are text fragments from a model document in the analyzed one. |
申请公布号 |
US2014324416(A1) |
申请公布日期 |
2014.10.30 |
申请号 |
US201214350292 |
申请日期 |
2012.11.16 |
申请人 |
OBSHCHESTVO S OGRANICHENNOY OTVETSTVENNOST'YU "TSENTR INNOVATSIY NATAL'I KASPERSKOY" |
发明人 |
Lapshin Vladimir Anatol"yevich;Perov Dmitriy Vsevolodovich;Pshekhotskaya Yekaterina Aleksandrovna |
分类号 |
G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method of automated analysis of text documents, the method comprising:
converting electronic files containing model documents to a predefined format that is capable of representing all characters in all languages that the text documents are written; extracting clauses representing meaningful fragments from the model documents; storing the converted files containing model documents in a database; converting each electronic file containing a document to be analyzed into the predefined format; comparing clauses extracted from the analyzed documents are compared with clauses extracted from the model documents; calculating fractions of clauses from one of the analyzed documents matching clauses from each model document; comparing the fractions with a pre-set threshold value to identify if the analyzed document contains text fragments from at least one of the model documents. |
地址 |
Moscow RU |