发明名称 |
SYSTEM AND METHOD FOR TEXT CLEANING |
摘要 |
<p>A method and system for cleaning an electronic document are provided. The method comprises: identifying at least one sentence in the electronic document; numerically representing features of the sentence to obtain a numeric feature representation associated with the sentence; inputting the numeric feature representation into a machine learning classifier, the machine learning classifier being configured to determine, based on each numeric feature representation, whether the sentence associated with that numeric feature representation is a bad sentence; and removing sentences determined to be bad sentences from the electronic document to create a cleaned document.</p> |
申请公布号 |
WO2011044658(A1) |
申请公布日期 |
2011.04.21 |
申请号 |
WO2010CA00668 |
申请日期 |
2010.05.07 |
申请人 |
2167959 ONTARIO INC.;XU, LIQIN;LEE, HYUN, CHUL |
发明人 |
XU, LIQIN;LEE, HYUN, CHUL |
分类号 |
G06F17/27;G06F15/18;G06F17/24 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|