发明名称 CORPUS EXPANSION SYSTEM AND METHOD THEREOF
摘要 A system and method for expanding new sample seeds to automatically expand corpora, in which sample seeds are used to collect corpus is provided. The new sample seeds are generated based on the already existed sample seeds and collected corpora; The corpus expansion strategy is determined based on all the sample seeds having been used and new sample seeds: The new sample seeds are refined based on the corpus expansion strategy, and the refined new sample seeds are used to further collect corpus. The above steps are repeatedly executed until predefined condition is satisfied. According to the invention, corpus may be automatically expanded from the web or other resources with low cost and in convenient way to improve the coverage of corpora.
申请公布号 US2008250015(A1) 申请公布日期 2008.10.09
申请号 US20080138139 申请日期 2008.06.12
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 GUO HONG LEI;ZHANG LI;QIU ZHAO MING;SHEN LI QIN;GUO ZHI LI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址