发明名称 Method and apparatus for statistical text filtering
摘要 <p>Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (V R ) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (V R ) is then compared with a threshold value (V T ) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.</p>
申请公布号 EP1229454(A2) 申请公布日期 2002.08.07
申请号 EP20010480087 申请日期 2001.09.13
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 CREPY, HUBERT
分类号 G06F17/27;G06F17/28;(IPC1-7):G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址