发明名称 METHOD OF AUTOMATED ANALYSIS OF TEXT DOCUMENTS
摘要 Automated analysis of text documents is used to scan text documents in order to find phrases or text fragments from other documents, or modifying the existing ones. A comparatively fast and universally applicable method finds phrases, sentences or even text fragments from other documents. The method includes: all electronic files containing model documents are converted to a given format; meaningful fragments, called “clauses”, are extracted from them; the converted files containing model documents are stored in the database; each electronic file containing a document to be analyzed is converted to the given format; clauses extracted from analyzed documents are compared with clauses extracted from model documents; fractions of clauses from an analyzed document matching clauses from each model document are calculated; fractions found are then compared with a pre-set threshold value in order to find out whether there are text fragments from a model document in the analyzed one.
申请公布号 US2014324416(A1) 申请公布日期 2014.10.30
申请号 US201214350292 申请日期 2012.11.16
申请人 OBSHCHESTVO S OGRANICHENNOY OTVETSTVENNOST'YU "TSENTR INNOVATSIY NATAL'I KASPERSKOY" 发明人 Lapshin Vladimir Anatol"yevich;Perov Dmitriy Vsevolodovich;Pshekhotskaya Yekaterina Aleksandrovna
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method of automated analysis of text documents, the method comprising: converting electronic files containing model documents to a predefined format that is capable of representing all characters in all languages that the text documents are written; extracting clauses representing meaningful fragments from the model documents; storing the converted files containing model documents in a database; converting each electronic file containing a document to be analyzed into the predefined format; comparing clauses extracted from the analyzed documents are compared with clauses extracted from the model documents; calculating fractions of clauses from one of the analyzed documents matching clauses from each model document; comparing the fractions with a pre-set threshold value to identify if the analyzed document contains text fragments from at least one of the model documents.
地址 Moscow RU