发明名称 Method and apparatus for statistical text filtering
摘要 Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.
申请公布号 US6879722(B2) 申请公布日期 2005.04.12
申请号 US20010895562 申请日期 2001.06.29
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 CREPY HUBERT
分类号 G06F17/27;G06F17/28;(IPC1-7):G06K9/72 主分类号 G06F17/27
代理机构 代理人
主权项
地址
您可能感兴趣的专利